Problem creating file names for web documents

John Sun, 20 Nov 2005 05:43:14 -0800

This command fails:
wget -p \
'http://asx.com.au/asx/research/CompanyInfoSearchResults.jsp?searchBy=asxCode&allinfo=on&asxCode=WES&companyName=&principalActivity=&industryGroup=NO#'

because the document retrieved contains an image with a pretty long URI,and wget generates a too-long file name.

The problem arose on SuSE 10.0 with reiserfs. I don't know whetherresiserfs has different specs wrt file names from other filesystems suchas ext3.

What I hope to do is create a document that I can browse later: I don'tneed to have the image regenerated.

What would work for me is for wget to provide a means of calling anexternal program or script which I could write to transform the URI (orthe local part of it) into a file name. I invisage wget writing the URLto the script's stdin and reading the result from its stdout. It wouldbe like the facility in squid which allows external programs (such assquidGuard) to peruse URIs for documents being fetched and maybetransforming them into something completely different: www.sex.com towww.nosex.lan perhaps.

In the particular case of the document above, the only significantdifference between that document and others I'd wish to retrieve is acode, typcally three letters (WES above)

It would also be cool to be able to transform the URN before it'sretrieved, as I can do in Squid: I have in mind changing searcharguments to a CGI or similar to get results more to my liking. Anexample site would be www.realestate.com.au where one can search forproperties for sale: the search function supports results pages of up to100, but the search form doesn't. One might also be able to drop ads bytransforming the URIs to, maybe, http://127.0.0.1/images/blank.png.However, that's less important to me than the first: the first preventsme from doing something I want to do.

Problem creating file names for web documents

Reply via email to