On 08/05/2013 04:29 PM, JP Verkamp wrote:
Is there a nice / idiomatic way to work with gzipped data in a streaming
manner (to avoid loading the rather large files into memory at once). So
far as I can tell, my code isn't doing that. It hangs for a while on the
call to gunzip-through-ports, long enough to uncompress the entire file,
then reads are pretty quick afterwords.

Here's what I have thus far:

#lang racket

(require file/gunzip)

(define-values (pipe-from pipe-to) (make-pipe))
(with-input-from-file "test.rkt.gz"
   (lambda ()
     (gunzip-through-ports (current-input-port) pipe-to)
     (for ([line (in-lines pipe-from)])
       (displayln line))))

You should probably 1) limit the size of the pipe (to stop it from inflating the whole file at once) and 2) put the gunzip-through-ports call in a separate thread. The gunzip thread will block when the pipe is full; when your program reads some data out of the pipe, the gunzip thread will be able to make some more progress. Something like this:

(define-values (pipe-from pipe-to) (make-pipe 4000))
(with-input-from-file "test.rkt.gz"
  (lambda ()
    (thread
      (lambda ()
        (gunzip-through-ports (current-input-port) pipe-to)
        (close-output-port pipe-to)))
    (for ([line (in-lines pipe-from)])
      (displayln line))))

As an additional problem, that code doesn't actually work.
in-lines seems to be waiting for an eof-object? that
gunzip-through-ports isn't sending. Am I missing something? It ends up
just hanging after reading and printing the file.

The docs don't say anything about closing the port, so you'll probably have to do that yourself. In the code above, I added a call to close-output-port.

Ryan

____________________
 Racket Users list:
 http://lists.racket-lang.org/users

Reply via email to