date:20130330

Re: [Bug-wget] wget 1.14 possibly writing off-spec warc.gz files

2013-03-30 Thread Gijs van Tulder

Hi, > It appears wget may be creating slightly malformed GZIP skip-length > fields I think that's correct: Wget doesn't write the subfield length in the "extra field" section of the header. After the subfield ID "sl" it should write the length LEN (see RFC 1952 [1]), but it doesn't. Luckily,

Re: [Bug-wget] wget 1.14 possibly writing off-spec warc.gz files

2013-03-30 Thread Andy Jackson

Tim Rühsen gmx.de> writes: > Unzipping it and zipping it again results in a 2387 byte file. > > So, for a first glimpse, it looks like Wget compresses very suboptimal. > But I won't say it is a bug before I take a deeper look... (in the next days). That's probably working as intended. By conven

Re: [Bug-wget] wget 1.14 possibly writing off-spec warc.gz files

2013-03-30 Thread Tim Rühsen

Am Freitag, 29. März 2013 schrieb Andy Jackson: > When using wget 1.14 to generate warc.gz files, e.g. > > wget -O tempname --warc-file="output" "http://example.com"; > > the files this creates do not play back well using the Internet Archives > warc.gz parsers, throwing errors like > > "Inva