File name too long

2005-03-21 Thread Martin Trautmann
Hi all,

is there a fix when file names are too long?


Example:

URL=http://search.ebay.de/ws/search/SaleSearch?copagenum=3D1sosortproperty=3D2sojs=3D1version=3D2sosortorder=3D2dfts=3D-1catref=3DC6coaction=3Dcomparesoloctog=3D9dfs=3D20050024dfte=3D-1saendtime=3D396614from=3DR9dfe=3D20050024satitle=wgetcoentrypage=3DsearchssPageName=3DADME:B:SS:DE:21;



-

bash-2.04$ wget -kxE $URL
--15:16:37--  
http://search.ebay.de/ws/search/SaleSearch?copagenum=3D1sosortproperty=3D2sojs=3D1version=3D2sosortorder=3D2dfts=3D-1catref=3DC6coaction=3Dcomparesoloctog=3D9dfs=3D20050024dfte=3D-1saendtime=3D396614from=3DR9dfe=3D20050024satitle=wgetcoentrypage=3DsearchssPageName=3DADME:B:SS:DE:21
   = 
`search.ebay.de/ws/search/SaleSearch?copagenum=3D1sosortproperty=3D2sojs=3D1version=3D2sosortorder=3D2dfts=3D-1catref=3DC6coaction=3Dcomparesoloctog=3D9dfs=3D20050024dfte=3D-1saendtime=3D396614from=3DR9dfe=3D20050024satitle=wgetcoentrypage=3DsearchssPageName=3DADME:B:SS:DE:21'

Proxy request sent, awaiting response... 301 Moved Perminantly
Location: 
/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2
 [following]
--15:16:37--  
http://search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2
   = 
`search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2'

Length: 46,310 [text/html]
search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2.html:
 File name too long

Cannot write to 
`search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2.html'
 (File name too long).
search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2.html:
 File name too long
Converting 
search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2.html...
 nothing to do.
Converted 1 files in 0.00 seconds.



... apart from that the main thing I look for is how to obtain
the search results. I still don't manage how to get the result from
search.ebay.de and then download the links to cgi.ebay.de in one:

  wget -kxrE -l1 -D cgi.ebay.de -H $URL


Re: File name too long

2005-03-21 Thread Martin Trautmann
On 2005-03-21 15:32, [EMAIL PROTECTED] wrote:
 *** This is not problem of wget, but your filesystem. Try to do 
 
 touch 
 search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2.html

I'm very sure that my file system has some limits somewhere - but I
suppose a web server may create virtual URLs which will be too long or
will include illegal characters for almost any file system around.


The file name here might get repaired by some regex, e.g.
wget_?catref=C6coaction=comparecoentrypage=searchcopagenum=1dfte=Q2d1dfts=Q2d1flt=9from=R9fsoo=2fsop=2saetm=396614sojs=1sspagename=ADMEQ3aBQ3aSSQ3aDEQ3a21version=2.html

However, I'd be comfortable enough with some fixed length or char
limitation, such as a 'trim' extension:

  -tc, --trimcharacter char cut filename after character, such as _
  -tl, --trimlengthnum  cut filename after num characters
  -ts, --trimsuffixnum  digits used for incremented cut filenames
  -tt, --trimtable file log trimmed file name and original to file


For the moment I'd be happy enough with saving to a md5.html checksum as
filename instead of a filename too long for my fs.
The output log could tell me about the shrinked and the original
filename.

  search.ebay.de and then download the links to cgi.ebay.de in one:
  
wget -kxrE -l1 -D cgi.ebay.de -H $URL
 
 *** maybe to create SHA1 sum of the request and store the result in this file
 (but you will not know what was the original request, if you don't create some
 DB of requests). Or do just simple counting
 
 URL=.
 sha1sum=$( echo -n $URL | sha1sum )
 echo $sha1sum $URL  SHA1-URL.db
 wget -O sha1sum.html [other options] $URL
 
 or
 
 URL=
 i=0
 echo $i $URL  URL.db
 wget -O search-$i.html $URL
 
 Could be this your solution?

Nice idea - I'll give it a try. However, it does not answer the -D problem
itself. I'm afraid this does require some further awk/sed processing of
the result?

Thanks,
Martin


Re: wget -D 12.ab.de -i $file

2005-03-07 Thread Martin Trautmann

On Sat, 05 Mar 2005 Hrvoje Niksic wrote:
 -D filters the URLs encountered with -r.  Specifying an input file is
 the same as specifying those URLs on the command-line.  If you need to
 exclude domains from the input file, I guess you can use something
 like `grep -v'.

Hi Hrvoje,

thanks - but grep is no suitable option. I'd have to combine it with any
other perl/sed/awk first in order both to merge a href tags longer than one
line and split lines that have more than one href within, just to make sure
that only the desired domains are listed.

It's sad that filter options do work on -r only, but not on -i (as a special
type of -r -l1).

Application example: I got a mailbox file which includes those URLs. I'd
like to download all from a certain site.

One workaround might be to convert this mailbox to a basic html file and
read if via http in order to force the -r -H -D branch, instead of using 
-D -F -i locally.


Is there a easier solution in order to tell wget that -i is 'within' the
recursive path?

Thanks,
Martin


wget -D 12.ab.de -i $file

2005-03-04 Thread Martin Trautmann
Hi all,

my problem: I got an html file that includes both links to e.g.

  http://12.ab.de/xyz
  http://34.ab.de/


I actually want to perform a 

  wget -D12.ab.de -Ixyz -p -F -i input.html

However, this will trigger files from 34.ab.de as well.
I tried --exclude-domains 34.ab.de without success.

I'm afraid that reading the URLs from an input file can't be passed through
a -D filter? What's a reasonable behavior of combining -i and -D?

Wget 1.8.2

Thanks,
Martin


Re: Please make wget follow javascript links!

2002-09-10 Thread Martin Trautmann

On Mon 2002-09-09 (21:42), [EMAIL PROTECTED] wrote:
  I don't want a JS engine - but is it that hard to create a filter list
  that does identify anything that does look like a file?
  
  The main JS commands should be easy to understand - but maybe soem kind
  or filter file with regexp where to find the next file could be done
  easily (while it's more hard to create this filter read mechanism first)
  
  javascript:winopen(somepage.html, size and location)
javascript:winopen(\([^]*\).*)- \1
  
  (PS: on the other hand it's pretty simple to run a grep and detect those
  files manually)
 
 Really? have you try this?
 javascript:window.open(i + nde + x.h + tml)
 
 winopen() is not even a JS command it is a user defined function. 
 What worked for you in a specific case does not work in the general cases.

agreed - the more complicated you want to make links by JS, the more
you'll find the need for a real JS engine.

 If you want this kind of support, you need a JS engine.

... but in general, JS is mainly used to open a window of its own with a
given, full html file name. A basic filter on *.htm* could add some
extra hits.

Kind regards
Martin



wget -r and JavaScript

2002-05-17 Thread Martin Trautmann


Hi all,

is there a chance to let wget retrieve pages linked by JavaScript?

Currently I don't manage to download e.g.

 a href=inside.phtml?page=area1 onClick=se
tButton('image1') target=index onMouseOver=turnOn('image1','Area');return true 
onMouseOut=turnOff('imag
e1'); 

or 

a href=javascript:makeNewWindow('icons.phtml?iconId=21', 'explain',
'height=500,width=450,location=no,menubar=no,scrollbars=yes,resizable=yes,toolbar=no,status=yes')
onMouseOver=window.status = 'Icon Explanation';  return true
onMouseOut=window.status = 'Area'; return trueimg
src=../phpimages/product/functionicon/area.jpg  border=0 alt=Area
width=40 height=40/a 


Kind regards
Martin