trouble loading and installing wget

2006-12-11 Thread Siddiqui, Kashif
Hello,

 

I'm trying to install wget on my itanium 11.23 system and getting the
following error when executing it:

 

[EMAIL PROTECTED]:/tmp$ wget
http://hpux.cs.utah.edu/hppd/cgi-bin/redirect?hpux/Gnu/wget-1.10.2/wget-
1.10.2-ia64-11.23.depot.gz

--08:31:36--
http://hpux.cs.utah.edu/hppd/cgi-bin/redirect?hpux/Gnu/wget-1.10.2/wget-
1.10.2-ia64-11.23.depot.gz

   =
`redirect?hpux%2FGnu%2Fwget-1.10.2%2Fwget-1.10.2-ia64-11.23.depot.gz'

Resolving hpux.cs.utah.edu... 155.98.64.90

/usr/lib/hpux32/dld.so: Unsatisfied code symbol '__umodsi3' in load
module '/usr/local/bin/wget'.

Killed

 

 

If I use the source code and run the configure script, then do a 'make
install' I get the following error:

 

[EMAIL PROTECTED]:/tmp/wget-1.10.2$ make install

cd src  make CC='gcc' CPPFLAGS='-O' DEFS='-DHAVE_CONFIG_H
-DSYSTEM_WGETRC=\/usr/local/etc/wgetrc\
-DLOCALEDIR=\/usr/local/share/locale\'  CFLAGS='-O' LDFLAGS=''
LIBS='-ldl  -L/usr/local/lib/hpux32 /usr/local/lib/hpux32/libssl.so
/usr/local/lib/hpux32/libcrypto.so'  prefix='/usr/local'
exec_prefix='/usr/local' bindir='/usr/local/bin'
infodir='/usr/local/info' mandir='/usr/local/man' manext='1' install.bin

gcc -I. -I. -O  -DHAVE_CONFIG_H
-DSYSTEM_WGETRC=\/usr/local/etc/wgetrc\
-DLOCALEDIR=\/usr/local/share/locale\ -O -c cmpt.c

gcc -I. -I. -O  -DHAVE_CONFIG_H
-DSYSTEM_WGETRC=\/usr/local/etc/wgetrc\
-DLOCALEDIR=\/usr/local/share/locale\ -O -c connect.c

In file included from connect.c:41:

/usr/include/sys/socket.h:535: error: static declaration of 'sendfile'
follows non-static declaration

/usr/include/sys/socket.h:506: error: previous declaration of 'sendfile'
was here

/usr/include/sys/socket.h:536: error: static declaration of 'sendpath'
follows non-static declaration

/usr/include/sys/socket.h:508: error: previous declaration of 'sendpath'
was here

connect.c: In function 'bind_local':

connect.c:457: warning: passing argument 3 of 'getsockname' from
incompatible pointer type

connect.c: In function 'accept_connection':

connect.c:507: warning: passing argument 3 of 'accept' from incompatible
pointer type

connect.c: In function 'socket_ip_address':

connect.c:528: warning: passing argument 3 of 'getsockname' from
incompatible pointer type

connect.c:530: warning: passing argument 3 of 'getpeername' from
incompatible pointer type

 

 

Any idea's and assistance would be greatly appreciated.

 

Thank you very much.

 

-Kashif

 



Re: wget css parsing, updated to trunk

2006-12-11 Thread Ted Mielczarek

On 12/5/06, Ted Mielczarek [EMAIL PROTECTED] wrote:


Hello all,

I have updated my CSS parsing code for wget to trunk, the results are
here:
http://ted.mielczarek.org/code/wget-modified/trunk/

I will submit the patch to wget-patches shortly.

My original posting:
http://www.mail-archive.com/wget@sunsite.dk/msg09142.html




Is there any interest in this?  I'm using it for a private project, and I've
had some off-list interest in it, but nothing on-list.  Is it worth my time
to pursue this, or should I just consider it a private fork?

Regards,
-Ted


directory-prefix bug in Win32

2006-12-11 Thread Denis Golovan
Hi!

  When using -P or --directory-prefix in v1.11 Beta 1 and later v1.11 Beta 
1(with spider patch) command-line
switches wget does not pay attention to neither of them. It saves files in 
the current directory. Wget v1.10.2 worked right.
  Hope, this bug won't live long :). 





Re: wget css parsing, updated to trunk

2006-12-11 Thread R Kimber
On Mon, 11 Dec 2006 09:14:11 -0500
Ted Mielczarek wrote:

 Is there any interest in this?  I'm using it for a private project,
 and I've had some off-list interest in it, but nothing on-list.  Is
 it worth my time to pursue this, or should I just consider it a
 private fork?

As a user, I'm certainly interested and think css parsing is important.

- Richard
-- 
Richard Kimber
http://www.psr.keele.ac.uk/


RE: ERROR 500 problem

2006-12-11 Thread Sandhu, Ranjit
Maybe the server has some sort of limitations to hits from the same IP
address over a time period.  1400 pages is a lot, maybe they got mad at
you and send you all 500's from then on :)

Ranjit Sandhu
703.803.1755
SRA

-Original Message-
From: Yoav Atzmony [mailto:[EMAIL PROTECTED] 
Sent: Monday, December 11, 2006 12:24 PM
To: [EMAIL PROTECTED]
Subject: ERROR 500 problem

Hi, I hope someone can shed light on this problem.

I am trying to crawl a particular site, and am getting strange results.
I had crawled it successfully in the past but lately I only am able to
crawl about 1400 of the 8000 pages.  I am constantly getting (as
reported in a verbose log file):
HTTP request sent, awaiting response... 500 Internal Server Error
11:04:18 ERROR 500: Internal Server Error.

This error happens intermitently on some pages during the first 1400
pages, i.e.
http://www.ryland.com/find-your-new-home/29-northern-kentucky/1115-claib
orne/11777-shenandoah.html

But then this is what happens in the log file, and subsequently ALL
files receive the error 500 (as shown in a non-verbose log file):
WARNING: Certificate verification error for www.ryland.com: unable to
get local issuer certificate
16:58:03
URL:https://www.ryland.com/home/contact-us/29-1361-community-and-floor-p
lan-information.html
[257903/257903] -
files/www.ryland.com/home/contact-us/29-1361-community-and-floor-plan-i
nformation.html
[1]
16:58:19 URL:http://www.ryland.com/home/29-1034-contact-us.html
[115640/115640] - files/www.ryland.com/home/29-1034-contact-us.html
[1]
http://www.ryland.com/find-your-new-home/29-northern-kentucky/1034-frenc
[EMAIL PROTECTED]/driving-directions.html:
16:58:23 ERROR 500: Internal Server Error.
http://www.ryland.com/find-your-new-home/29-northern-kentucky/1034-frenc
h-quarter-orleans/11256-summit.html:
16:58:27 ERROR 500: Internal Server Error.

Now if I was to call wget only on the page that failed with ERROR 500,
it would crawl just fine.  Here are settings I am using:
wget  www.ryland.com -o LogRyland3.txt -t 5 --random-wait -v

I have run wget numerous times on this site, and I receive ERROR 500 on
different pages before it reaches the point where all pages fail from
then on, as shown above.

And I have appended the INI file which has more settings (below).
Again, crawling this site used to work fine.  And crawling failed pages
works when I crawl them individually. Any help would be GREATLY
appreciated!

INI FILE STARTS HERE---
# Rewrote the wgetrc / wget.ini file from scratch, based on the manual
for version 1.9

logfile = log.txt
tries = 1
timeout = 30
wait = 1
randomwait = on
quota = 5000m
restrict_file_names = windows
add_hostdir = on
span_hosts = off
dir_prefix = files
cache = off
recursive = on
use_proxy = off
robots = off
verbose = off
keep-session-cookies = on
save-cookies = sw_cookies.txt
check_certificate = off
#reclevel = 7

reject =
GIF,jpg,JPG,jpeg,JPEG,bmp,BMP,pdf,PDF,css,CSS,js,JS,mpeg,MPEG,mov,MOV,av
i,AVI,wmv,WMV,doc,DOC,ppt,PPT,csv,CSV,xls,XLS,txt,TXT,png,PNG,ra,RA,ram,
RAM,tif,TIF,zip,ZIP,rar,RAR,class,CLASS,swf,SWF,pl,xml,XML,mp3,MP3,sid,S
ID,ivr,IVR,psd,PSD,rft,RTF,dwf,DWF,abk,acl,acm,acp,act,acv,ad,adb,add,ad
m,adp,adr,af2,af3,afm,ai,aif,alb,all,ams,anc,ani,ans,api,apr,aps,arc,arj
,art,asa,asc,asd,asf,asm,ast,asx,att,avi,awd,b4,bak,bas,bat,bfc,bg,bi,bi
f,bin,bk,bks,bm1,bmk,bmp,brx,bs1,bsp,btm,cab,cal,cas,cat,cb,ccb,ccf,cch,
ccm,cda,cdf,cdi,cdr,cdt,cdx,cel,cfb,cfg,cgm,ch,chk,chp,cil,cim,cin,ck1,c
k2,ck3,ck4,ck5,ck6,cla,clp,cls,cmd,cmf,cmp,cmv,cnf,cnm,cnq,cnt,cob,cod,c
om,cpd,cpe,cpi,cpl,cpp,cpr,cpt,cpx,crd,crp,crt,csc,csp,css,csv,ct,ctl,cu
e,cur,cut,cv,cwk,cws,cxx,dat,dbf,dbx,dcr,dcs,dcx,ddf,def,der,dib,dic,dif
,dir,diz,dlg,dll,dmf,dmg,doc,dot,dpr,drv,drw,dsg,dsm,dsp,dsq,dsw,dwg,dxf
,emf,enc,eps,er1,erx,evy,ewl,exe,f77,f90,far,fav,fax,fh3,fif,fit,flc,fli
,flt,fmb,fmt,fmx,fog,fon,for,fot,fp,fp1,fp3,fpx,frm,frx,gal,gcp,ged,gem,
gen,gfc,gfi,gfx,gid,gif,gim,gix,gna,gnx,gra,grd,grp,gt2,gtk,gwx,gwz,gz,h
ed,hel,hex,hgl,hlp,hog,hpj,hpp,hqx,hst,ht,htx,ica,icb,icm,ico,idd,idq,if
f,igf,iif,ima,img,inc,inf,ini,inp,ins,iso,isp,isu,it,iw,jar,jav,jbf,jff,
jif,jmp,jn1,jpe,jpg,js,jtf,kdc,kfx,kye,lbm,ldb,leg,lha,lib,lis,log,lpd,l
rc,lst,lwo,lwp,lzh,lzs,m3d,mad,maf,mak,mam,map,maq,mar,mas,mat,max,maz,m
b1,mcc,mcs,mcw,mda,mdb,mde,mdl,mdn,mdw,mdz,med,mer,met,mi,mic,mid,mmf,mm
m,mod,mov,mp3,mpe,mpg,mpp,msg,msi,msn,msp,mtm,mus,mvb,mwp,nap,ncb,nsf,ns
t,ntf,obd,obj,obz,ocx,ofn,oft,okt,olb,ole,opt,or2,or3,org,p10,p65,pab,pa
k,pal,pat,pbk,pbm,pcd,pcl,pcs,pct,pcx,pdf,pdq,pfa,pfb,pfc,pfm,pgl,pgm,pi
c,pif,pig,pin,pix,pj,pkg,pl,plt,pm5,pm6,png,pnt,pot,pp4,ppa,ppm,pps,ppt,
pre,prf,prn,prs,prz,ps,psd,pst,ptm,pub,pwd,pwz,pxl,qad,qbw,qdt,qlb,qry,q
t,qtm,qxd,ra,ram,ras,raw,rc,rec,reg,res,rft,rle,rm,rmi,rov,rpt,rtf,rtm,s
3m,sam,sav,sc2,scc,scd,sch,scn,scp,scr,sct,sdl,sdr,sdt,sea,sep,shb,shg,s
hs,shw,sit,slk,snd,sqc,sqr,sty,svx,sys,t2t,tar,taz,tex,tga,tgz,the,thn,t

Re: trouble loading and installing wget

2006-12-11 Thread Steven M. Schweda
From: Siddiqui, Kashif

 I'm trying to install wget on my itanium 11.23 system [...]

   I assume that that's HP-UX 11.23, as in:

[EMAIL PROTECTED] uname -a
HP-UX td176 B.11.23 U ia64 1928826293 unlimited-user license

 /usr/lib/hpux32/dld.so: Unsatisfied code symbol '__umodsi3' in load
 module '/usr/local/bin/wget'.

   And where did you get _that_ copy of wget?

 If I use the source code and run the configure script, then do a 'make
 install' I get the following error:
 [...]
 gcc -I. -I. -O  -DHAVE_CONFIG_H
 -DSYSTEM_WGETRC=\/usr/local/etc/wgetrc\
 -DLOCALEDIR=\/usr/local/share/locale\ -O -c connect.c
 
 In file included from connect.c:41:
 
 /usr/include/sys/socket.h:535: error: static declaration of 'sendfile'
 follows non-static declaration
 [...]

   Complaints about header files are often caused by a bad GCC
installation (or an OS upgrade which confuses GCC).

   I just tried building my VMS-oriented 1.10.2c kit using GCC on one of
the HP TestDrive systems, and I had some trouble ('ld: Unsatisfied
symbol libintl_gettext in file getopt.o'), but that's much later than
compiling connect.c, which got only the (usual) warnings about the
pointers.  That's with:

http://antinode.org/dec/sw/wget.html
http://antinode.org/ftp/wget/wget-1_10_2c_vms/wget-1_10_2c_vms.zip 

[EMAIL PROTECTED] gcc --version
gcc (GCC) 3.4.3
[...]

And I have no idea whether the GCC installation there is good or bad. 
(But it seems to be better than yours.)

   I also tried it using HP's C compiler (CC=cc ./configure):

[EMAIL PROTECTED] cc -V
cc: HP C/aC++ B3910B A.06.12 [Aug 17 2006]

Here, the make ran to an apparently successful completion, but real
testing is not convenient on the TestDrive systems, so I can't say
whether it would actually work better than what you have.

[EMAIL PROTECTED] ./src/wget -V
GNU Wget 1.10.2c built on hpux11.23.
[...]

   So, I'd suggest using HP's C compiler, or else re-installing GCC. 
After that, I'd suggest using the ITRC HP-UX forum:

http://forums1.itrc.hp.com/service/forums/familyhome.do?familyId=117

 Any idea's and assistance [...]

   That's ideas, by the way.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


More detail on bug

2006-12-11 Thread Denis Golovan

  When using -P or --directory-prefix in v1.11 Beta 1 and later v1.11 Beta
 1(with spider patch) command-line
 switches wget does not pay attention to neither of them. It saves files in 
 the current directory. Wget v1.10.2 worked right.

  Such incorrent behaviour appeares only if server http answer contains 
Content-disposition tag.
Looking forward to developers comments! 





Wget in 1.11 beta 1 found

2006-12-11 Thread denis
Hi, dear developers!

  When using -P or --directory-prefix in v1.11 Beta 1 and later v1.11 Beta
1(with spider patch) command-line switches wget does not pay attention to 
neither of them.
It saves files in the current directory. Such incorrent behaviour appeares only 
if server http
answer contains Content-disposition tag. Wget v1.10.2 worked right.
  Hope, this bug won't live long :). 

-- 
 denis  mailto:[EMAIL PROTECTED]