Hello, I was looking for an alternative to HTTrack to archive single pages and found wget. It works just like a charm thanks to the "--page-requisites" options. However I would like to post-process the archived files. I thought of using the logs but it seems they are... just a bunch of messages. I could a well-formed XML logfile :). Here is a sample from my logfile, generated using the "--output-file" option :
--- Start --- --18:29:29-- http://www.../<file_name> => `www...' Resolving www...... done. Connecting to www...[<IP>]:<port>... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] 0K .......... .......... .... 94.54 KB/s 18:29:30 (94.54 KB/s) - `www.../<saved_file>' saved [25170] --- End --- I think I could quickly come up with a PHP script to parse it but is there any existing solution available to do it ? I searched for "log parser" on the group archives and found no results. Moreover I definitely think the logfile could use a serious restructuration but I can understand that its first use is not parsing. It's just a logfile afterall :). JM.