File name too long
Hi all, is there a fix when file names are too long? Example: URL=http://search.ebay.de/ws/search/SaleSearch?copagenum=3D1sosortproperty=3D2sojs=3D1version=3D2sosortorder=3D2dfts=3D-1catref=3DC6coaction=3Dcomparesoloctog=3D9dfs=3D20050024dfte=3D-1saendtime=3D396614from=3DR9dfe=3D20050024satitle=wgetcoentrypage=3DsearchssPageName=3DADME:B:SS:DE:21; - bash-2.04$ wget -kxE $URL --15:16:37-- http://search.ebay.de/ws/search/SaleSearch?copagenum=3D1sosortproperty=3D2sojs=3D1version=3D2sosortorder=3D2dfts=3D-1catref=3DC6coaction=3Dcomparesoloctog=3D9dfs=3D20050024dfte=3D-1saendtime=3D396614from=3DR9dfe=3D20050024satitle=wgetcoentrypage=3DsearchssPageName=3DADME:B:SS:DE:21 = `search.ebay.de/ws/search/SaleSearch?copagenum=3D1sosortproperty=3D2sojs=3D1version=3D2sosortorder=3D2dfts=3D-1catref=3DC6coaction=3Dcomparesoloctog=3D9dfs=3D20050024dfte=3D-1saendtime=3D396614from=3DR9dfe=3D20050024satitle=wgetcoentrypage=3DsearchssPageName=3DADME:B:SS:DE:21' Proxy request sent, awaiting response... 301 Moved Perminantly Location: /wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2 [following] --15:16:37-- http://search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2 = `search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2' Length: 46,310 [text/html] search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2.html: File name too long Cannot write to `search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2.html' (File name too long). search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2.html: File name too long Converting search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2.html... nothing to do. Converted 1 files in 0.00 seconds. ... apart from that the main thing I look for is how to obtain the search results. I still don't manage how to get the result from search.ebay.de and then download the links to cgi.ebay.de in one: wget -kxrE -l1 -D cgi.ebay.de -H $URL
Re: File name too long
On 2005-03-21 15:32, [EMAIL PROTECTED] wrote: *** This is not problem of wget, but your filesystem. Try to do touch search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2.html I'm very sure that my file system has some limits somewhere - but I suppose a web server may create virtual URLs which will be too long or will include illegal characters for almost any file system around. The file name here might get repaired by some regex, e.g. wget_?catref=C6coaction=comparecoentrypage=searchcopagenum=1dfte=Q2d1dfts=Q2d1flt=9from=R9fsoo=2fsop=2saetm=396614sojs=1sspagename=ADMEQ3aBQ3aSSQ3aDEQ3a21version=2.html However, I'd be comfortable enough with some fixed length or char limitation, such as a 'trim' extension: -tc, --trimcharacter char cut filename after character, such as _ -tl, --trimlengthnum cut filename after num characters -ts, --trimsuffixnum digits used for incremented cut filenames -tt, --trimtable file log trimmed file name and original to file For the moment I'd be happy enough with saving to a md5.html checksum as filename instead of a filename too long for my fs. The output log could tell me about the shrinked and the original filename. search.ebay.de and then download the links to cgi.ebay.de in one: wget -kxrE -l1 -D cgi.ebay.de -H $URL *** maybe to create SHA1 sum of the request and store the result in this file (but you will not know what was the original request, if you don't create some DB of requests). Or do just simple counting URL=. sha1sum=$( echo -n $URL | sha1sum ) echo $sha1sum $URL SHA1-URL.db wget -O sha1sum.html [other options] $URL or URL= i=0 echo $i $URL URL.db wget -O search-$i.html $URL Could be this your solution? Nice idea - I'll give it a try. However, it does not answer the -D problem itself. I'm afraid this does require some further awk/sed processing of the result? Thanks, Martin
Re: wget -D 12.ab.de -i $file
On Sat, 05 Mar 2005 Hrvoje Niksic wrote: -D filters the URLs encountered with -r. Specifying an input file is the same as specifying those URLs on the command-line. If you need to exclude domains from the input file, I guess you can use something like `grep -v'. Hi Hrvoje, thanks - but grep is no suitable option. I'd have to combine it with any other perl/sed/awk first in order both to merge a href tags longer than one line and split lines that have more than one href within, just to make sure that only the desired domains are listed. It's sad that filter options do work on -r only, but not on -i (as a special type of -r -l1). Application example: I got a mailbox file which includes those URLs. I'd like to download all from a certain site. One workaround might be to convert this mailbox to a basic html file and read if via http in order to force the -r -H -D branch, instead of using -D -F -i locally. Is there a easier solution in order to tell wget that -i is 'within' the recursive path? Thanks, Martin
wget -D 12.ab.de -i $file
Hi all, my problem: I got an html file that includes both links to e.g. http://12.ab.de/xyz http://34.ab.de/ I actually want to perform a wget -D12.ab.de -Ixyz -p -F -i input.html However, this will trigger files from 34.ab.de as well. I tried --exclude-domains 34.ab.de without success. I'm afraid that reading the URLs from an input file can't be passed through a -D filter? What's a reasonable behavior of combining -i and -D? Wget 1.8.2 Thanks, Martin
Re: Please make wget follow javascript links!
On Mon 2002-09-09 (21:42), [EMAIL PROTECTED] wrote: I don't want a JS engine - but is it that hard to create a filter list that does identify anything that does look like a file? The main JS commands should be easy to understand - but maybe soem kind or filter file with regexp where to find the next file could be done easily (while it's more hard to create this filter read mechanism first) javascript:winopen(somepage.html, size and location) javascript:winopen(\([^]*\).*)- \1 (PS: on the other hand it's pretty simple to run a grep and detect those files manually) Really? have you try this? javascript:window.open(i + nde + x.h + tml) winopen() is not even a JS command it is a user defined function. What worked for you in a specific case does not work in the general cases. agreed - the more complicated you want to make links by JS, the more you'll find the need for a real JS engine. If you want this kind of support, you need a JS engine. ... but in general, JS is mainly used to open a window of its own with a given, full html file name. A basic filter on *.htm* could add some extra hits. Kind regards Martin
wget -r and JavaScript
Hi all, is there a chance to let wget retrieve pages linked by JavaScript? Currently I don't manage to download e.g. a href=inside.phtml?page=area1 onClick=se tButton('image1') target=index onMouseOver=turnOn('image1','Area');return true onMouseOut=turnOff('imag e1'); or a href=javascript:makeNewWindow('icons.phtml?iconId=21', 'explain', 'height=500,width=450,location=no,menubar=no,scrollbars=yes,resizable=yes,toolbar=no,status=yes') onMouseOver=window.status = 'Icon Explanation'; return true onMouseOut=window.status = 'Area'; return trueimg src=../phpimages/product/functionicon/area.jpg border=0 alt=Area width=40 height=40/a Kind regards Martin