Hello,

I was looking for an alternative to HTTrack to archive single pages and
found wget. It works just like a charm thanks to the "--page-requisites"
options. However I would like to post-process the archived files. I thought
of using the logs but it seems they are... just a bunch of messages. I could
a well-formed XML logfile :). Here is a sample from my logfile, generated
using the "--output-file" option :

--- Start ---
--18:29:29--  http://www.../<file_name>
           => `www...'
Resolving www...... done.
Connecting to www...[<IP>]:<port>... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]

    0K .......... .......... ....                               94.54 KB/s

18:29:30 (94.54 KB/s) - `www.../<saved_file>' saved [25170]
--- End ---

I think I could quickly come up with a PHP script to parse it but is there
any existing solution available to do it ? I searched for "log parser" on
the group archives and found no results. Moreover I definitely think the
logfile could use a serious restructuration but I can understand that its
first use is not parsing. It's just a logfile afterall :).

JM.



Reply via email to