Re: Unix History: Why does hexdump default to word alignment?

2011-12-01 Thread Nadav Har'El
On Thu, Dec 01, 2011, Elazar Leibovich wrote about Unix History: Why does 
hexdump default to word alignment?:
 The default behaviour of hexdump is to align data word-wide. For instance

Just as a comment, if I remember correctly, hexdump isn't actually
part of ancient Unix history - the original was od, which as its name
suggests dumps in *octal*, but had an option od -x to see hexadecimal.

In any case, od and hexdump are very similar, and apparently have the
same ideosynchacies, as od -x also defaults two two-byte words.

 printf '\xFF\xFF\x01' | hexdump
 000  0001
 003
 
 This makes little sense to me. In C, structs are not necessarily aligned to
 words, and it doesn't seems useful to view about any data format for which
 you're sure everything is word-aligned. The hexdump -C behaviour makes
 much more sense in the general case.

When you say words and word aligned here, you mean historic 2 byte words.
This is indeed *NOT* a very useful default on any modern computers. In some
old computers, like the PDP11 2 byte words were common and useful.
In other old computers, this was never a useful default.

I guess nobody cares because since the 1970s when these tools were
written, nobody uses them any more ;-) I don't think I used od in at
least two decade... At least since less was invented and usually does
the right thing (show ASCII when possible, or hex for nonvisible
characters).

Amazingly, I don't believe that the original od even had an option to
see hex for each byte: od -c didn't show hex, od -x showed hex for
each two bytes, and od -b (for bytes) showed each byte but octal
(which evidentally was more popular than hex in the old days).

Gnu's od can do what you want with od -t x1. As you saw, so can
hexdump with the -C flag.

-- 
Nadav Har'El|   Thursday, Dec 1 2011, 
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |Make it idiot proof and someone will make
http://nadav.harel.org.il   |a better idiot.

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Unix History: Why does hexdump default to word alignment?

2011-12-01 Thread Elazar Leibovich
On Thu, Dec 1, 2011 at 10:10 AM, Nadav Har'El n...@math.technion.ac.ilwrote:


 When you say words and word aligned here, you mean historic 2 byte
 words.


Indeed. Is there any other meaning for word other than two bytes?


 This is indeed *NOT* a very useful default on any modern computers. In some
 old computers, like the PDP11 2 byte words were common and useful.


I'm still not convinced that it was a useful default. Since C which is the
lingua fanca of Unix was clearly bytes based.


 I guess nobody cares because since the 1970s when these tools were
 written, nobody uses them any more ;-) I don't think I used od in at
 least two decade...


Well, I use them if I need to quickly inspect a file in binary format when
I'm already using the command line. Say, I'm having a unit test that
implements a binary protocol, and I want to verify with my eyes that I'm
getting the right results. ./generate_msg | hexdump -C is quicker than
./generate_msg tmp  sane_hex_editor tmp. How do you do that without
hexdump, if you actually have this need at all.

But maybe it's just a bad old habit of mine. I guess that if you get used
to a more modern workflow, you can make using modern tools to inspect the
same data just as quickly. As you can understand, less will not help me
with that.
___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Unix History: Why does hexdump default to word alignment?

2011-12-01 Thread guy keren

On 12/01/2011 10:10 AM, Nadav Har'El wrote:

On Thu, Dec 01, 2011, Elazar Leibovich wrote about Unix History: Why does hexdump 
default to word alignment?:

The default behaviour of hexdump is to align data word-wide. For instance


Just as a comment, if I remember correctly, hexdump isn't actually
part of ancient Unix history - the original was od, which as its name
suggests dumps in *octal*, but had an option od -x to see hexadecimal.

In any case, od and hexdump are very similar, and apparently have the
same ideosynchacies, as od -x also defaults two two-byte words.


 printf '\xFF\xFF\x01' | hexdump
 000  0001
 003

This makes little sense to me. In C, structs are not necessarily aligned to
words, and it doesn't seems useful to view about any data format for which
you're sure everything is word-aligned. The hexdump -C behaviour makes
much more sense in the general case.


When you say words and word aligned here, you mean historic 2 byte words.
This is indeed *NOT* a very useful default on any modern computers. In some
old computers, like the PDP11 2 byte words were common and useful.
In other old computers, this was never a useful default.

I guess nobody cares because since the 1970s when these tools were
written, nobody uses them any more ;-) I don't think I used od in at
least two decade... At least since less was invented and usually does
the right thing (show ASCII when possible, or hex for nonvisible
characters).

Amazingly, I don't believe that the original od even had an option to
see hex for each byte: od -c didn't show hex, od -x showed hex for
each two bytes, and od -b (for bytes) showed each byte but octal
(which evidentally was more popular than hex in the old days).

Gnu's od can do what you want with od -t x1. As you saw, so can
hexdump with the -C flag.



apparently, you did not use binary data serialization in the past two 
decades. when you serialize data and store it into a file (also on the 
network), it is very useful to be able to see the data as 2-byte or 
4-byte or whatever-byte numbers, when debugging.


in the last few years, i have been using od more then i did in the 
decade before that ;)


--guy

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Unix History: Why does hexdump default to word alignment?

2011-12-01 Thread geoffrey mendelson


On Dec 1, 2011, at 10:28 AM, Elazar Leibovich wrote:


Indeed. Is there any other meaning for word other than two bytes?

This is indeed *NOT* a very useful default on any modern computers.  
In some

old computers, like the PDP11 2 byte words were common and useful.


Well, let's see, going back to the 1960's, IBM 1401, word size set by  
a bit in memory, a word mark on a digit.


IBM 360, 32 bit words. IBM 1130 16 bit words, HP 2000/2100 16 bit  
words, CDC 6400/6600 (basis for Cray) 60 bit words. Burroughs 5500  
series used a 48 bit word. Philco 2000 (I actually used an 1000 or  
1100 (I can't remember) which was a smaller version) 48 bit word.  
SDS940 used a 24 bit word.


I was too late to use Quicktran (it was used up until June of the year  
I started 10th grade, but I did not start until September) which ran  
on an IBM 7044 with a 36 bit word.


Another high school nearby which I did not go to had a PDP 8 with a 12  
bit word.



Those were the ones I can remember having used in the time frame 1969  
to 1972.


The original Unics (later UNIX) machine was a PDP 7 with an 18 bit word.

PDP 11's were relative latecomers to the game First released in 1970.  
The PDP 11 offered Unix from the start, but most PDP11's ran a much  
less demanding operating system.


Geoff.

--
Geoffrey S. Mendelson,  N3OWJ/4X1GM
My high blood pressure medicine reduces my midichlorian count. :-(














___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Unix History: Why does hexdump default to word alignment?

2011-12-01 Thread Elazar Leibovich
On Thu, Dec 1, 2011 at 11:32 AM, geoffrey mendelson 
geoffreymendel...@gmail.com wrote:


 Well, let's see, going back to the 1960's, IBM 1401, word size set by a
 bit in memory, a word mark on a digit.


Thanks for educating me, you need to get a job in CS archaeology.
But what did the word mark mean? In my ignorance I thought that work meant
to imply amount of bits.
___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Fwd: Unix History: Why does hexdump default to word alignment?

2011-12-01 Thread geoffrey mendelson


On Dec 1, 2011, at 12:00 PM, Elazar Leibovich wrote:

But what did the word mark mean? In my ignorance I thought that work  
meant to imply amount of bits.


The IBM 1401 and similar series of computers used DECIMAL not binary  
numbers and the word mark was the extra bit turned on to indicate an  
end of word. Actually the word mark was at the beginning of a word so  
the end was really the word mark of the next word after it.


So if you had the number 123456789 in memory and you wanted to address  
it the one would be at the low memory address with the word mark bit  
turned on, and the nine at the high end. You would point the  
instruction to the high address (that of the nine).



If I remember correctly, instructions addressed the ones digit in a  
number, so you could specify as many digits in a word as you needed.  
This was common in 1950's vintage computers as business computers had  
decimal instructions and scientific ones binary instructions (integer  
with the option of floating point on some computers).


The IBM 360 was the first AFAIK to have both. It used instructions  
with the decimal length in them, so although binary words were 32 bit,  
decimal ones, if you want to call them words at all, were up to 31  
digits plus a sign (1-16 bytes).


Turbo Pascal for the IBM PC had a decimal mode were it would store  
numbers as decimal digits and do decimal arithmetic on them. I never  
used TP, so I don't know much more about it. Any Pascal programmers  
out there? Do Linux Pascal compilers have it?


Geoff.






--
Geoffrey S. Mendelson,  N3OWJ/4X1GM
My high blood pressure medicine reduces my midichlorian count. :-(











___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Unix History: Why does hexdump default to word alignment?

2011-12-01 Thread Oleg Goldshmidt
On Thu, Dec 1, 2011 at 12:25 PM, geoffrey mendelson
geoffreymendel...@gmail.com wrote:


 Turbo Pascal for the IBM PC had a decimal mode were it would store numbers
 as decimal digits and do decimal arithmetic on them. I never used TP, so I
 don't know much more about it. Any Pascal programmers out there?

My first paid software job was programming in Turbo Pascal on IBM PCs,
20-something years ago. Late 80ies... Before Linux even existed as a
concept... Oh, my.

I *think* TP had BCD (binary-coded decimals, google or check Wikipedia
if you are interested). Basically, in BCDs every decimal digit is
coded separately as a 4-bit binary number. IIRC, all x86 processors
provided BCD-related instructions (conversions to and from), but I
think even then it was slower than straightforward binary arithmetic.
It was slow because the machine instructions were for single bytes
only, but not for wider objects. I do not remember if it was in any
way related to 8087 math co-processors.

I don't recall ever using BCD explicitly, but I may have been too
inexperienced to notice. Never programmed in Pascal since.

-- 
Oleg Goldshmidt | p...@goldshmidt.org

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Unix History: Why does hexdump default to word alignment?

2011-12-01 Thread geoffrey mendelson


On Dec 1, 2011, at 12:51 PM, Oleg Goldshmidt wrote:


I don't recall ever using BCD explicitly, but I may have been too
inexperienced to notice. Never programmed in Pascal since.



Oleg,

One used BCD for money. I once worked at a place where one of the  
programmers wrote the pension reporting programs for the IBM 370 in PL/ 
I using floating point arithmetic. When people saw the reports and  
noticed that they had strangely rounded balances in their accounts,  
the whole thing was scrapped and re-written using decimal numbers.


Geoff.
--
Geoffrey S. Mendelson,  N3OWJ/4X1GM
My high blood pressure medicine reduces my midichlorian count. :-(














___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Unix History: Why does hexdump default to word alignment?

2011-12-01 Thread Oleg Goldshmidt
On Thu, Dec 1, 2011 at 12:58 PM, geoffrey mendelson
geoffreymendel...@gmail.com wrote:

 One used BCD for money. I once worked at a place where one of the
 programmers wrote the pension reporting programs for the IBM 370 in PL/I
 using floating point arithmetic. When people saw the reports and noticed
 that they had strangely rounded balances in their accounts, the whole thing
 was scrapped and re-written using decimal numbers.

Oh, I know that. I also know a bit about the dangers of floating point
arithmetic.

That job of mine had nothing to do with money though (unlike some of
the subsequent ones).

By the way, eliminating rounding errors is the primary reason why
beasts like Java's BigDecimals exist today. True to its (ugly) form,
Java does not allow operator overloading so BigDecimal arithmetic is
implemented via method calls and even simple formulas look unparseable
by naked eye in code. [Just venting workplace-related frustration
here, sorry... ;-)]

-- 
Oleg Goldshmidt | o...@goldshmidt.org

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Unix History: Why does hexdump default to word alignment?

2011-12-01 Thread Omer Zak
On Thu, 2011-12-01 at 12:51 +0200, Oleg Goldshmidt wrote:
 IIRC, all x86 processors
 provided BCD-related instructions (conversions to and from), but I
 think even then it was slower than straightforward binary arithmetic.
 It was slow because the machine instructions were for single bytes
 only, but not for wider objects. I do not remember if it was in any
 way related to 8087 math co-processors.

The 8086 had several instructions for adjusting results of arithmetic
operations to conform to BCD values.
The 8087 had the FBLD and FBSTP for loading and storing BCD values.

--- Omer


-- 
My Commodore 64 is suffering from slowness and insufficiency of memory;
and its display device is grievously short of pixels.  Can anyone help?
My own blog is at http://www.zak.co.il/tddpirate/

My opinions, as expressed in this E-mail message, are mine alone.
They do not represent the official policy of any organization with which
I may be affiliated in any way.
WARNING TO SPAMMERS:  at http://www.zak.co.il/spamwarning.html


___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Unix History: Why does hexdump default to word alignment?

2011-12-01 Thread Nadav Har'El
On Thu, Dec 01, 2011, guy keren wrote about Re: Unix History: Why does hexdump 
default to word alignment?:
 apparently, you did not use binary data serialization in the past
 two decades. when you serialize data and store it into a file (also
 on the network), it is very useful to be able to see the data as
 2-byte or 4-byte or whatever-byte numbers, when debugging.

Well, for debugging you typically use tools like a debugger (gdb, ddd,
etc.) or network sniffer or something - and those have their own methods
of displaying data, and do not use od. So using the actual od command
in a shell or shell-script is not something I ended up doing in recent years.
I don't think I even noticed the new hexdump sibling of od cropped up
in Linux ;-)

-- 
Nadav Har'El|   Thursday, Dec 1 2011, 
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |He who dies with the most toys is still
http://nadav.harel.org.il   |dead -- Citibank billboard, Manhattan 2001

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Unix History: Why does hexdump default to word alignment?

2011-12-01 Thread Yedidyah Bar-David
On Thu, Dec 01, 2011 at 01:55:24PM +0200, Nadav Har'El wrote:
 On Thu, Dec 01, 2011, guy keren wrote about Re: Unix History: Why does 
 hexdump default to word alignment?:
  apparently, you did not use binary data serialization in the past
  two decades. when you serialize data and store it into a file (also
  on the network), it is very useful to be able to see the data as
  2-byte or 4-byte or whatever-byte numbers, when debugging.
 
 Well, for debugging you typically use tools like a debugger (gdb, ddd,
 etc.) or network sniffer or something - and those have their own methods
 of displaying data, and do not use od. So using the actual od command
 in a shell or shell-script is not something I ended up doing in recent years.
 I don't think I even noticed the new hexdump sibling of od cropped up
 in Linux ;-)

Regarding new siblings of od, and just in case someone expects a useful
piece of information in this thread - I happened to use several times
xxd, which can also do the reverse - convert its output back to binary -
so you can use it together with your $EDITOR as a poor man's binary
editor. I guess people used uuencode/uudecode for this in the past,
perhaps I did too, but xxd is much more comfortable.
-- 
Didi


___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Unix History: Why does hexdump default to word alignment?

2011-12-01 Thread guy keren

On 12/01/2011 01:55 PM, Nadav Har'El wrote:

On Thu, Dec 01, 2011, guy keren wrote about Re: Unix History: Why does hexdump 
default to word alignment?:

apparently, you did not use binary data serialization in the past
two decades. when you serialize data and store it into a file (also
on the network), it is very useful to be able to see the data as
2-byte or 4-byte or whatever-byte numbers, when debugging.


Well, for debugging you typically use tools like a debugger (gdb, ddd,
etc.) or network sniffer or something - and those have their own methods
of displaying data, and do not use od. So using the actual od command
in a shell or shell-script is not something I ended up doing in recent years.
I don't think I even noticed the new hexdump sibling of od cropped up
in Linux ;-)



you can use a debugger only for the basic code. you cannot use a 
debugger when you're dealing with multiple threads that access the same 
shared data and could have race conditions. in those cases you need to 
run a test, find that the eventual data is incorrect, and track back 
using logs and friends, to find the culprit(s).


this is the common case in storage systems - but also in other types of 
systems.


--guy

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Unix History: Why does hexdump default to word alignment?

2011-12-01 Thread Elazar Leibovich
On Fri, Dec 2, 2011 at 9:28 AM, guy keren c...@actcom.co.il wrote:

 you can use a debugger only for the basic code. you cannot use a debugger
 when you're dealing with multiple threads that access the same shared data
 and could have race conditions. in those cases you need to run a test, find
 that the eventual data is incorrect, and track back using logs and friends,
 to find the culprit(s).


I think that what Nadav meant, is instead of adding

log_raw_data_to_file(file);

you can set a breakpoint there, and watch the data with gdb's x.

Like you I find the printf-debugging approach more appealing, but it might
be that I'm just stuck in the past, and reluctant to try new tools.



 this is the common case in storage systems - but also in other types of
 systems.


Other type I ran into, is when passing binary data from process A to
process B using pipes, it's extremely quick to test the actual data with
./proc | hexdump -C, then by redirecting the output to a file, or tapping
the network/pipe.
___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il