Gijs van Tulder writes:
> +2013-03-31 Gijs van Tulder
> +
> + * warc.c: Correctly write the field length in the skip length field
> + of .warc.gz files. (Following the GZIP spec in RFC 1952.)
thanks for the patch, I have just pushed it.
Giuseppe
Hi,
> It appears wget may be creating slightly malformed GZIP skip-length
> fields
I think that's correct: Wget doesn't write the subfield length in the
"extra field" section of the header. After the subfield ID "sl" it
should write the length LEN (see RFC 1952 [1]), but it doesn't.
Luckily,
Tim Rühsen gmx.de> writes:
> Unzipping it and zipping it again results in a 2387 byte file.
>
> So, for a first glimpse, it looks like Wget compresses very suboptimal.
> But I won't say it is a bug before I take a deeper look... (in the next days).
That's probably working as intended. By conven
Am Freitag, 29. März 2013 schrieb Andy Jackson:
> When using wget 1.14 to generate warc.gz files, e.g.
>
> wget -O tempname --warc-file="output" "http://example.com";
>
> the files this creates do not play back well using the Internet Archives
> warc.gz parsers, throwing errors like
>
> "Inva
When using wget 1.14 to generate warc.gz files, e.g.
wget -O tempname --warc-file="output" "http://example.com";
the files this creates do not play back well using the Internet Archives
warc.gz parsers, throwing errors like
"Invalid FExtra length/records".
It appears wget may be creating sl