bug#41518: Bug in od?

2020-05-30 Thread Andreas Schwab
On Mai 29 2020, Yuan Cao wrote:

> It just feels strange because the order does not reflect the order of the
> characters in the file.

But that's not true.  It reflects exactly how 2-byte numbers are stored
in memory on your system.  If you want to make a connection with
characters, you need to think about UCS-2 characters.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





bug#41518: Bug in od?

2020-05-29 Thread Bob Proulx
Yuan Cao wrote:
> > https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#The-_0027od-_002dx_0027-command-prints-bytes-in-the-wrong-order_002e
> 
> Thanks for pointing me to this documentation.
> 
> It just feels strange because the order does not reflect the order of the
> characters in the file.

It feels strange in the environment *today*.  But in the 1970's when
the 'od' was written it was perfectly natural on the PDP-11 to print
out the native machine word in the *native word order* of the PDP-11.
During that time most software operated on the native architecture and
the idea of being portable to other systems was not yet common.

The PDP-11 is a 16-bit word machine.  Therefore what you are seeing
with the 2-byte integer and the order it is printed is the order that
it was printed on the PDP-11 system.  And has remained unchanged to
the present day.  Because it can't change without breaking all
historical use.

For anyone using od today the best way to use -x is -tx1 which prints
bytes in a portable order.  Whenever you think to type in -x use -tx1
instead.  This avoids breaking historical use and produces the output
that you are wanting.

> I think it might have been useful to get the "by word" value of the file if
> you are working with a binary file historically. One might have stored some
> data as a list of shorts. Then, we can easily view the data using "od -x
> data_file_name".
> 
> Since memory is so cheap now, people are probably using just using chars
> for text, and 4 byte ints or 8 byte ints where they used to use 2 byte ints
> (shorts) before. In this case, the "by word" order does not seem to me to
> be as useful and violates the principle of least astonishment needlessly.

But changing the use of options to a command is a hard problem and
cannot be done without breaking a lot of use of it.  The better way is
not to try.  The options to head and tail changed an eon ago and yet
just in the last week I ran across a posting where the option change
bit someone in the usage change.

And since there is no need for any breaking change it is better not to
do it.  Simply use the correct options for what you want.  -tx1 in
this case.

> It might be interesting to change the option to print values by double word
> or quadword instead or add another option to let the users choose to print
> by double word or quadword if they want.

And the size of 16-bits was a good value for a yester-year.  32-bits
has been a good size for some years.  Now 64-bits is the most common
size.  The only way to win is not to play.  Better to say the size
explicitly.  And IMNHO the best size is 1 regardless of architecture.

  od -Ax -tx1z -v

Each of those options have been added over the years and each changes
the behavior of the program.  Each of those would be a breaking change
if they were made the default.  Best to ask for what you want explicitly.

I strongly recommend https://www.ietf.org/rfc/ien/ien137.txt as
required reading.

Bob





bug#41518: Bug in od?

2020-05-29 Thread Yuan Cao
On Fri, May 29, 2020 at 1:20 AM Bob Proulx  wrote:

> A little more information.
>
> Pádraig Brady wrote:
> > Yuan Cao wrote:
> > > I recently came across the following behavior.
> > >
> > > When using "--traditional x2" or "-x" option, it seems the order of hex
> > > code output for the characters is pairwise reversed (if that's the
> correct
> > > way of describing it).
>
> ‘-x’
>  Output as hexadecimal two-byte units.  Equivalent to ‘-t x2’.
>
> Outputs 16-bit integers in the *native byte order* of the machine.
> Which may be either big-endian or little-endian depending on the
> machine.  Not portable.  Depends upon the machine it is run upon.
>
> > If you want to hexdump independently of endianess you can:
> >
> >   od -Ax -tx1z -v
>
> The -tx1 option above is portable because it outputs 1-byte units
> instead of 2-byte units which is independent of endianess.
>
> This is the FAQ entry for this topic.
>
>
> https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#The-_0027od-_002dx_0027-command-prints-bytes-in-the-wrong-order_002e
>
> Bob
>

Thanks for pointing me to this documentation.

It just feels strange because the order does not reflect the order of the
characters in the file.

I think it might have been useful to get the "by word" value of the file if
you are working with a binary file historically. One might have stored some
data as a list of shorts. Then, we can easily view the data using "od -x
data_file_name".

Since memory is so cheap now, people are probably using just using chars
for text, and 4 byte ints or 8 byte ints where they used to use 2 byte ints
(shorts) before. In this case, the "by word" order does not seem to me to
be as useful and violates the principle of least astonishment needlessly.

It might be interesting to change the option to print values by double word
or quadword instead or add another option to let the users choose to print
by double word or quadword if they want.

Best Regards,

Yuan


bug#41518: Bug in od?

2020-05-28 Thread Bob Proulx
A little more information.

Pádraig Brady wrote:
> Yuan Cao wrote:
> > I recently came across the following behavior.
> > 
> > When using "--traditional x2" or "-x" option, it seems the order of hex
> > code output for the characters is pairwise reversed (if that's the correct
> > way of describing it).

‘-x’
 Output as hexadecimal two-byte units.  Equivalent to ‘-t x2’.

Outputs 16-bit integers in the *native byte order* of the machine.
Which may be either big-endian or little-endian depending on the
machine.  Not portable.  Depends upon the machine it is run upon.

> If you want to hexdump independently of endianess you can:
> 
>   od -Ax -tx1z -v

The -tx1 option above is portable because it outputs 1-byte units
instead of 2-byte units which is independent of endianess.

This is the FAQ entry for this topic.

  
https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#The-_0027od-_002dx_0027-command-prints-bytes-in-the-wrong-order_002e

Bob





bug#41518: Bug in od?

2020-05-25 Thread Pádraig Brady

tag 41518 notabug
close 41518
stop

response below...

On 25/05/2020 04:05, Yuan Cao wrote:

Hello,

I recently came across the following behavior.

When using "--traditional x2" or "-x" option, it seems the order of hex
code output for the characters is pairwise reversed (if that's the correct
way of describing it).

For example, using "od -cx" on a test file that contains "123456789\n", you
get the following output:

000   1   2   3   4   5   6   7   8   9   0  \n
  3231  3433  3635  3837  3039  000a
013

It seems like it should be the following instead:

000   1   2   3   4   5   6   7   8   9   0  \n
  3132  3334  3536  3738  3930  0a00
013

The version involved is od in GNU coreutils 8.28.


That's because you're on a little endian machine.
If you want to reorder as per a big endian machine you can:

  od --endian=big -cx your_file

If you want to hexdump independently of endianess you can:

  od -Ax -tx1z -v

cheers,
Pádraig





bug#41518: Bug in od?

2020-05-24 Thread Yuan Cao
Hello,

I recently came across the following behavior.

When using "--traditional x2" or "-x" option, it seems the order of hex
code output for the characters is pairwise reversed (if that's the correct
way of describing it).

For example, using "od -cx" on a test file that contains "123456789\n", you
get the following output:

000   1   2   3   4   5   6   7   8   9   0  \n
 3231  3433  3635  3837  3039  000a
013

It seems like it should be the following instead:

000   1   2   3   4   5   6   7   8   9   0  \n
 3132  3334  3536  3738  3930  0a00
013

The version involved is od in GNU coreutils 8.28.

Best Regards,

Yuan