Re: [Dorset] Use of cat to concatenate .gz files

2011-01-17 Thread Simon P Smith
On 17/01/2011 12:30, d-...@hadrian-way.co.uk wrote:

  

 I just spotted this  
 http://en.wikipedia.org/wiki/Cat_%28Unix%29#Binary_use. 
 Is this working
 because the .gz file is treated as a binary?

  
Cat cares not one jot the file format, it just sticks the bits together
AFAIK.
As I said earlier, gzip files can be concatenated as they are a
sequential list of
members.  If you stitch two together, when being processed, gzip just
happily finds
the next member header after the end of the original file and continues
processing.


--
Next meeting:  Bournemouth, Tuesday 2011-02-01 20:00
Meets, Mailing list, IRC, LinkedIn, ...  http://dorset.lug.org.uk/
How to Report Bugs Effectively:  http://goo.gl/4Xue


Re: [Dorset] Use of cat to concatenate .gz files

2011-01-17 Thread Ralph Corderoy

Hi Terry,

 Now my understanding of cat goes back to a noddy Unix course about
 15-20 years ago, but I always thought the 'cat' stood for 'catalogue'
 and was used to list the content of a text file.  Having looked at
 http://unixhelp.ed.ac.uk/CGI/man-cgi?cat though, I now know that it
 stands for concatenate

That man page is wrong!  It doesn't stand for concatenate, else it would
be con(1), not cat(1).  :-)  It stands for catenate, always has done.
Here's the man page from the 7th Edition of Unix from Bell Labs.

wget -qO- http://www.cs.bell-labs.com/7thEdMan/vol1/man1.bun |
sed -n '/^-\.TH CAT/,/GO\.SYSIN DD/s/-//p' |
nroff -man

 So now I've got that out of the way, can someone explain what cat
 actually does with a compressed archive?  I assume it doesn't
 understand the the content, so is it simply stitching the two together
 in dumb fashion?

Yes.  cat simply catenates all the files specified to its standard
output, or reads standard input if no files are given.  *With no
options* it doesn't care about, look at, or interpret the files'
content.

 If so, how would it be used to 'overlay' the tinycore.gz contents as
 is being suggested?

Simon P Smith wrote:
 AFAIK gzip can have additional bits added since it is a collection of
 members which have a header and trailer which are sequentially
 processed.  ...  Concatenating gz files should result in a valid file.

Simon's right.  See ADVANCED USAGE in gzip(1);  it's documented.

$ (gzip hello  printf 'world!\n' | gzip) | gunzip
hello
world!
$

Tim Waugh wrote:
 The secret to this is that the gzip format is clever enough to work
 correctly when two gzip files are simply concatenated.

 I don't know the details of why it works, but I believe it's something
 to do with the streaming nature of gzip, compared with e.g.
 block-based compression such as bzip2.

http://www.ietf.org/rfc/rfc1952.txt gives the file format;  see 2.2.
It's simply that a decompressor can tell when it's got to the end of the
current compressed file, realises it's not yet at the end of the input,
reads a little bit more and expects it to be the header for another
whole compressed file.

There is no carry-over of the dictionary from the first compression to
the second so the nature of the compression method compared to bzip2
isn't the reason this is possible.  In fact, bzip2(1) does it too.

$ (bzip2 hello  printf 'world!\n' | bzip2) | bunzip2
hello
world!
$

That lack of carry-over is why catenating compressed files gives worse
overall compression than giving them all to one compressor to do,
preferable with similar files sorted to be near one another.

$ f=/etc/passwd
$ (gzip $f  gzip$f) | wc -c
1578
$ cat $f $f | gzip | wc -c
817
$

Back to Terry:
 My flawed experience told me that cat was used with text files and the
 man page I found earlier certainly didn't make it clear that any files
 could be cat'd together.  Whether that makes any sense clearly depends
 on what those files are, but having understood that fact I was able to
 get to the next step; gzip files wouldn't be broken because of the way
 they are structured.

Unix doesn't fundamentally distinguish between text and binary files at
the kernel level as other OSes do.  They're just a sequence of zero or
more bytes.  Unless stated otherwise, assume a command doesn't care
whether the bytes could be considered as a LF-terminated sequence of
zero or more lines of printable bytes.  Not having the text/binary
distinction is quite an advantage compared to, e.g. DOS, which also has
its ASCII SUB, Ctrl-Z, file terminating byte;  awful, mixing data and
metadata.

Cheers,
Ralph.


--
Next meeting:  Bournemouth, Tuesday 2011-02-01 20:00
Meets, Mailing list, IRC, LinkedIn, ...  http://dorset.lug.org.uk/
How to Report Bugs Effectively:  http://goo.gl/4Xue


[Dorset] Fwd: Power supply fault?

2011-01-17 Thread Walter

--
Next meeting:  Bournemouth, Tuesday 2011-02-01 20:00
Meets, Mailing list, IRC, LinkedIn, ...  http://dorset.lug.org.uk/
How to Report Bugs Effectively:  http://goo.gl/4Xue


Re: [Dorset] Use of cat to concatenate .gz files

2011-01-17 Thread Terry Coles
On Monday 17 Jan 2011, Ralph Corderoy wrote:
  Now my understanding of cat goes back to a noddy Unix course about
  15-20 years ago, but I always thought the 'cat' stood for 'catalogue'
  and was used to list the content of a text file.  Having looked at
  http://unixhelp.ed.ac.uk/CGI/man-cgi?cat though, I now know that it
  stands for concatenate
 
 That man page is wrong!  It doesn't stand for concatenate, else it would
 be con(1), not cat(1).  :-)  It stands for catenate, always has done.
 Here's the man page from the 7th Edition of Unix from Bell Labs.
 
 wget -qO- http://www.cs.bell-labs.com/7thEdMan/vol1/man1.bun |
 sed -n '/^-\.TH CAT/,/GO\.SYSIN DD/s/-//p' |
 nroff -man

I never even knew that catenate was a word ;-)

 Back to Terry:
  My flawed experience told me that cat was used with text files and the
  man page I found earlier certainly didn't make it clear that any files
  could be cat'd together.  Whether that makes any sense clearly depends
  on what those files are, but having understood that fact I was able to
  get to the next step; gzip files wouldn't be broken because of the way
  they are structured.
 
 Unix doesn't fundamentally distinguish between text and binary files at
 the kernel level as other OSes do.  They're just a sequence of zero or
 more bytes.  Unless stated otherwise, assume a command doesn't care
 whether the bytes could be considered as a LF-terminated sequence of
 zero or more lines of printable bytes.  Not having the text/binary
 distinction is quite an advantage compared to, e.g. DOS, which also has
 its ASCII SUB, Ctrl-Z, file terminating byte;  awful, mixing data and
 metadata.

I knew that really :-)  The trouble is that I had only ever used cat with text 
files, so it never even occured to me, in this context, that a file is a file 
and it doesn't matter what is in it.

-- 
Terry Coles
64 bit computing with Kubuntu Linux


--
Next meeting:  Bournemouth, Tuesday 2011-02-01 20:00
Meets, Mailing list, IRC, LinkedIn, ...  http://dorset.lug.org.uk/
How to Report Bugs Effectively:  http://goo.gl/4Xue


Re: [Dorset] Use of cat to concatenate .gz files

2011-01-17 Thread Tim Allen

On 17/01/11 17:22, Terry Coles wrote:

On Monday 17 Jan 2011, Ralph Corderoy wrote:

Now my understanding of cat goes back to a noddy Unix course about
15-20 years ago, but I always thought the 'cat' stood for 'catalogue'
and was used to list the content of a text file.  Having looked at
http://unixhelp.ed.ac.uk/CGI/man-cgi?cat though, I now know that it
stands for concatenate


That man page is wrong!  It doesn't stand for concatenate, else it would
be con(1), not cat(1).  :-)  It stands for catenate, always has done.
Here's the man page from the 7th Edition of Unix from Bell Labs.

 wget -qO- http://www.cs.bell-labs.com/7thEdMan/vol1/man1.bun |
 sed -n '/^-\.TH CAT/,/GO\.SYSIN DD/s/-//p' |
 nroff -man


I never even knew that catenate was a word ;-)



It means the same as concatenate :)

Tim

--
Next meeting:  Bournemouth, Tuesday 2011-02-01 20:00
Meets, Mailing list, IRC, LinkedIn, ...  http://dorset.lug.org.uk/
How to Report Bugs Effectively:  http://goo.gl/4Xue


Re: [Dorset] Use of cat to concatenate .gz files

2011-01-17 Thread Victor Churchill
On 17 January 2011 18:01, Tim Allen t...@ls83.eclipse.co.uk wrote:
 On 17/01/11 17:22, Terry Coles wrote:


 I never even knew that catenate was a word ;-)


 It means the same as concatenate :)

 Tim


Moi non plus. I had always assumed that 'cat' was a perverted
short-form of 'concatenate' and that calling it that was just 'humour'
or laziness on the part of the early Unix designers.

(BTW, regarding the use of cat on non-text files ... fastest way to
clone a disk:

cat /dev/hda2  /dev/hdb2

)

-- 
best regards,

Victor Churchill,
Bournemouth

--
Next meeting:  Bournemouth, Tuesday 2011-02-01 20:00
Meets, Mailing list, IRC, LinkedIn, ...  http://dorset.lug.org.uk/
How to Report Bugs Effectively:  http://goo.gl/4Xue


Re: [Dorset] Use of cat to concatenate .gz files

2011-01-17 Thread Ralph Corderoy

Hi Tim,

  I never even knew that catenate was a word ;-)

 It means the same as concatenate :)

But it might make more sense to people, be more mnemonical, if its
proper name of catenate was used.  Then we wouldn't have things like I
thought it stood for catalogue.  :-)

dict(1) says L. catenatus for catenate and L. concatenatus for
concatenate.  Perhaps there's more of a distinction in Latin?  Anyway,
given I often hear people complain about the difficulty in remembering
command names and options I think it's worth being pedantic for the
mnemonic benefit.  :-)

Cheers,
Ralph.


--
Next meeting:  Bournemouth, Tuesday 2011-02-01 20:00
Meets, Mailing list, IRC, LinkedIn, ...  http://dorset.lug.org.uk/
How to Report Bugs Effectively:  http://goo.gl/4Xue


Re: [Dorset] Use of cat to concatenate .gz files

2011-01-17 Thread Dan Dart
cat works with mpg files too:

cat file1.mpg file2.mpg  file3.mpg

file3.mpg is a valid mpg file!

--
Next meeting:  Bournemouth, Tuesday 2011-02-01 20:00
Meets, Mailing list, IRC, LinkedIn, ...  http://dorset.lug.org.uk/
How to Report Bugs Effectively:  http://goo.gl/4Xue