Re: [Bug-wget] wget 1.14 possibly writing off-spec warc.gz files

2013-04-06 Thread Giuseppe Scrivano
Gijs van Tulder writes: > +2013-03-31 Gijs van Tulder > + > + * warc.c: Correctly write the field length in the skip length field > + of .warc.gz files. (Following the GZIP spec in RFC 1952.) thanks for the patch, I have just pushed it. Giuseppe

Re: [Bug-wget] wget 1.14 possibly writing off-spec warc.gz files

2013-03-30 Thread Gijs van Tulder
Hi, > It appears wget may be creating slightly malformed GZIP skip-length > fields I think that's correct: Wget doesn't write the subfield length in the "extra field" section of the header. After the subfield ID "sl" it should write the length LEN (see RFC 1952 [1]), but it doesn't. Luckily,

Re: [Bug-wget] wget 1.14 possibly writing off-spec warc.gz files

2013-03-30 Thread Andy Jackson
Tim Rühsen gmx.de> writes: > Unzipping it and zipping it again results in a 2387 byte file. > > So, for a first glimpse, it looks like Wget compresses very suboptimal. > But I won't say it is a bug before I take a deeper look... (in the next days). That's probably working as intended. By conven

Re: [Bug-wget] wget 1.14 possibly writing off-spec warc.gz files

2013-03-30 Thread Tim Rühsen
Am Freitag, 29. März 2013 schrieb Andy Jackson: > When using wget 1.14 to generate warc.gz files, e.g. > > wget -O tempname --warc-file="output" "http://example.com"; > > the files this creates do not play back well using the Internet Archives > warc.gz parsers, throwing errors like > > "Inva

[Bug-wget] wget 1.14 possibly writing off-spec warc.gz files

2013-03-29 Thread Andy Jackson
When using wget 1.14 to generate warc.gz files, e.g. wget -O tempname --warc-file="output" "http://example.com"; the files this creates do not play back well using the Internet Archives warc.gz parsers, throwing errors like "Invalid FExtra length/records". It appears wget may be creating sl