> It would be nice if wget could launch a custom command that 
> would output
> to wget a list of links in the file just downloaded. This 
> would make it
> fairly easy to download links specified using javascript and
css. 
> 
> For example on [1], I could use this command:
> 
> fgrep "MM_openBrWindow('" . |
>  sed 
>
"s/MM_openBrWindow('\([^']*\)'/\nXXXXXXXXXXXXXXXXXXXXXXXXX\1\n/g"
|
>  grep XXXXXXXXXXXXXXXXXXXXXXXXX |
>  sed s/XXXXXXXXXXXXXXXXXXXXXXXXX//g
> 
> Alternatively, the ability to specify additional regexes that
match
> links would work also.


This is exactly what I was asking about few days ago. There is no
such functionality in wget. As I found in archives, the developers
do not want to implement regexp due to huge size of regexp libs.
But external filtering seems to be interesting solution.

Recently I made a simple patch to workaround this problem. This
patch adds --output_filter parameter, which is used to specify
external filtering program. Filter is applied not to urls list,
but to whole fetched file right after file saving. The file name
is passed as the last argument to the filter command. I posted
this patch to wget-patches mailing list, and I can send it to you
if you're able to recompile wget on your system.


-- 
Sergey Martynoff

Reply via email to