On 10/30/2018 11:32 AM, 'Paulo Matos' via Racket Users wrote:
I have quite a few large files that I want to gzip to a single file
(without an intermediate concatenation) and then later gunzip.

I don't think you can do that - at least not without other software.  gzip/gunzip are meant to work only with a single file. gzip is a compression format, not an archive format - the compressed stream is assumed to contain a single object, it has no index or other metadata to handle multiple objects.  Typically you would tar the files and then zip the tar.

Interestingly the gunzipping is blocking on a read-line. I wonder if
this is because I cannot use gzip-through-ports the way I am doing it or
if there's a bug somewhere.

Generate 3 files with:
$ base64 /dev/urandom | head -c 1000000 > foo3
$ base64 /dev/urandom | head -c 1000000 > foo2
$ base64 /dev/urandom | head -c 1000000 > foo1

Now run the code:
#lang racket

(require file/gzip

(define paths '("foo1" "foo2" "foo3"))

;; compress
(printf "compressing~n")
(call-with-atomic-output-file "foo.gz"
   (lambda (op p)
     (for ([f (in-list paths)])
       (call-with-input-file f
         (lambda (i) (gzip-through-ports i op #false (current-seconds)))
         #:mode 'binary))))

;; decompress
(printf "decompressing~n")
(define-values (in out) (make-pipe))
    (call-with-input-file "foo.gz"
      (lambda (cin)
        (gunzip-through-ports cin out))
      #:mode 'binary))))
(call-with-atomic-output-file "foo.txt"
   (lambda (op p)
     (let loop ([l (read-line in)])
       (unless (eof-object? l)
         (write l op)
         (loop (read-line in))))))


This is going to block in a read-line, and I have a suspicion that it
blocks at the end of a compressed file. Is there a reason for it
blocking? Note that if you `zcat foo.gz | less` you can see the whole
file, so I am suspicious that something might be wrong with

Any suggestions to improve this?


I think you can recover by closing the port when gunzip finishes, but in this case you will only receive the 1st file  - additional files will be lost.  If you want to send multiple objects in the same stream, you need to use some protocol to delimit them.  I'm not sure how [or if] you can duplicate the port to gunzip-through-ports,  but you can read the port yourself and pass the data to gunzip.


