This command fails:
wget -p \
'http://asx.com.au/asx/research/CompanyInfoSearchResults.jsp?searchBy=asxCode&allinfo=on&asxCode=WES&companyName=&principalActivity=&industryGroup=NO#'

because the document retrieved contains an image with a pretty long URI, and wget generates a too-long file name.

The problem arose on SuSE 10.0 with reiserfs. I don't know whether resiserfs has different specs wrt file names from other filesystems such as ext3.

What I hope to do is create a document that I can browse later: I don't need to have the image regenerated.

What would work for me is for wget to provide a means of calling an external program or script which I could write to transform the URI (or the local part of it) into a file name. I invisage wget writing the URL to the script's stdin and reading the result from its stdout. It would be like the facility in squid which allows external programs (such as squidGuard) to peruse URIs for documents being fetched and maybe transforming them into something completely different: www.sex.com to www.nosex.lan perhaps.

In the particular case of the document above, the only significant difference between that document and others I'd wish to retrieve is a code, typcally three letters (WES above)

It would also be cool to be able to transform the URN before it's retrieved, as I can do in Squid: I have in mind changing search arguments to a CGI or similar to get results more to my liking. An example site would be www.realestate.com.au where one can search for properties for sale: the search function supports results pages of up to 100, but the search form doesn't. One might also be able to drop ads by transforming the URIs to, maybe, http://127.0.0.1/images/blank.png. However, that's less important to me than the first: the first prevents me from doing something I want to do.

Reply via email to