Problem recursive download

2003-10-16 Thread Sergey Vasilevsky
I use wget 1.8.2
Try recursive downdload www.map-by.info/index.html, but wget stop in first
page.
Why?
index.html have links to another page.

/usr/local/bin/wget -np -r -N -nH --referer=http://map-by.info  -P
/tmp/www.map-by.info -D map-by.info http://map-by.info
http://www.map-by.info
--10:09:25--  http://map-by.info/
   = `/p4/poisk/spider/resource/www.map-by.info/index.html'
Resolving proxy.open.by... done.
Connecting to proxy.open.by[193.232.92.3]:8080... connected.
Proxy request sent, awaiting response... 200 OK
Length: ignored [text/html]
Server file no newer than local file
`/p4/poisk/spider/resource/www.map-by.info/index.html' -- not retrieving.

--10:09:25--  http://www.map-by.info/
   = `/p4/poisk/spider/resource/www.map-by.info/index.html'
Connecting to proxy.open.by[193.232.92.3]:8080... connected.
Proxy request sent, awaiting response... 200 OK
Length: ignored [text/html]
Server file no newer than local file
`/p4/poisk/spider/resource/www.map-by.info/index.html' -- not retrieving.


FINISHED --10:09:26--
Downloaded: 0 bytes in 0 files



RE: Problem recursive download

2003-10-16 Thread Sergey Vasilevsky
I think wget strong verify link syntax:
a href=about_rus.html onMouseOver=img_on('main21');
onMouseOut=img_off('main21')
That link have incorrect symbol ';' not quoted in a 

 -Original Message-
 From: Sergey Vasilevsky [mailto:[EMAIL PROTECTED]
 Sent: Thursday, October 16, 2003 10:15 AM
 To: [EMAIL PROTECTED]
 Subject: Problem recursive download


 I use wget 1.8.2
 Try recursive downdload www.map-by.info/index.html, but wget stop in first
 page.
 Why?
 index.html have links to another page.

 /usr/local/bin/wget -np -r -N -nH --referer=http://map-by.info  -P
 /tmp/www.map-by.info -D map-by.info http://map-by.info
 http://www.map-by.info
 --10:09:25--  http://map-by.info/
= `/p4/poisk/spider/resource/www.map-by.info/index.html'
 Resolving proxy.open.by... done.
 Connecting to proxy.open.by[193.232.92.3]:8080... connected.
 Proxy request sent, awaiting response... 200 OK
 Length: ignored [text/html]
 Server file no newer than local file
 `/p4/poisk/spider/resource/www.map-by.info/index.html' -- not retrieving.

 --10:09:25--  http://www.map-by.info/
= `/p4/poisk/spider/resource/www.map-by.info/index.html'
 Connecting to proxy.open.by[193.232.92.3]:8080... connected.
 Proxy request sent, awaiting response... 200 OK
 Length: ignored [text/html]
 Server file no newer than local file
 `/p4/poisk/spider/resource/www.map-by.info/index.html' -- not retrieving.


 FINISHED --10:09:26--
 Downloaded: 0 bytes in 0 files





Re: Problem recursive download

2003-10-16 Thread Hrvoje Niksic
This seems to work in my copy of 1.8.2.  Perhaps you have something in
your .wgetrc that breaks things?



Re: Problem recursive download

2003-10-16 Thread Hrvoje Niksic
Sergey Vasilevsky [EMAIL PROTECTED] writes:

 I think wget strong verify link syntax:
 a href=about_rus.html onMouseOver=img_on('main21');
 onMouseOut=img_off('main21')
 That link have incorrect symbol ';' not quoted in a 

You are right.  However, this has been fixed in Wget 1.9-beta, which
will interpret the above as:

a href=about_rus.html onmouseover=img_on('main21') ;=; 
onmouseout=img_off('main21')

In other words, the HREF part will be correctly picked up by Wget.

Wget 1.9 will be released soon.  If you want to try it out, get it
from http://fly.srk.fer.hr/~hniksic/wget/wget-1.9-b5.tar.gz.



Wget 1.9-rc1 available for testing

2003-10-16 Thread Hrvoje Niksic
As the name implies, this should be 1.9 (with only version changed)
unless a show-stopper is discovered.  Get it from:

http://fly.srk.fer.hr/~hniksic/wget/wget-1.9-rc1.tar.gz


Problems with w2k in wget-1.9

2003-10-16 Thread Bloodflowers [Tuth 10]
First: the stupid errors (Winsock's fault)

It is true that the winsock functions do not set errno, its actually pretty 
simple to grab  the error code

errno = WSAGetLastError();

this should suffice. Unfortunately, even if the error is set properly, 
strerror() will NOT give out the result. It'll just go back to the Unknown 
error wiich I must say is MUCH MUCH better than the stupid errors it has 
been giving me, like :Bad file descriptor or No such file or directory.

But there are more problems with wget, and they're not recent. I've been 
tryin' with older versions of wget (1.8.2) and am having these very same 
problems on w2k machines. Whenever a transfer fails it will start giving 
10093 Errors and is never able to connect again.

according to MSDN, this is what it means:

WSANOTINITIALISED 10093
Successful WSAStartup not yet performed.
Either the application has not called WSAStartup or WSAStartup failed. The 
application may be accessing a socket that the current active task does not 
own (that is, trying to share a socket between tasks), or WSACleanup has 
been called too many times.

I haven't checked the error code that is given when the connection dies( 
stupid me, forget to make wget spit the error code . with a bit of luck by 
tomorrow I'll know it).

I'm gonna go over the code later on, and see if I can track down the 
problem. I'll post something more if there's anything usefull to report.

Thanks for everything :)

_
Tired of spam? Get advanced junk mail protection with MSN 8. 
http://join.msn.com/?page=features/junkmail



Re: Wget 1.9 about to be released

2003-10-16 Thread Tony Lewis
Hrvoje Niksic wrote:

 I'm about to release 1.9 today, unless it takes more time to upload it
 to ftp.gnu.org.
 
 If there's a serious problem you'd like fixed in 1.9, speak up now or
 be silent until 1.9.1.  :-)

I thought we were going to turn our attention to 1.10. :-)


Re: Wget 1.9 about to be released

2003-10-16 Thread Hrvoje Niksic
Tony Lewis [EMAIL PROTECTED] writes:

 Hrvoje Niksic wrote:

 I'm about to release 1.9 today, unless it takes more time to upload it
 to ftp.gnu.org.
 
 If there's a serious problem you'd like fixed in 1.9, speak up now or
 be silent until 1.9.1.  :-)

 I thought we were going to turn our attention to 1.10. :-)

The two are not mutually exclusive.  There is now a branch for the 1.9
code.  That means that 1.10 can be worked on, and stability patches
applied to 1.9 in parallel.

Are there volunteers to take over maintaining the branch?  That would
mean following the -patches list and committing bug fixes (and *only*
bug fixes) to the 1.9 branch.



Re: errno patches for Windows

2003-10-16 Thread Hrvoje Niksic
[ Moving discussion from wget-patches to wget. ]

Gisle Vanem [EMAIL PROTECTED] writes:

 I'm pretty sure that other GNU applications -- that have also been
 ported to Windows -- use errno.  I wonder how they do it...

 Lynx uses this:
   #define SOCKET_ERRNO errno
   #ifdef WINDOWS
   #undef  SOCKET_ERRNO 
   #define SOCKET_ERRNOWSAGetLastError()
   ..

 and never errno for network calls directly.  But then again, it
 never *sets* errno as Wget do.

OK.  So the whole thing with errno is only necessary when dealing with
Winsock errors.  For errors from, say, fopen it's fine to use errno?

There is another possible approach.  We already #define read and write
to call Winsock stuff.  We could add some more magic so that they and
other Winsock invocations automatically set errno to last error value,
translating Windows errors to errno errors.  XEmacs has code that
seems to support that kind of translation (Winsock errors might need
to be handled separately, but they could be added) -- please take a
look:

struct errentry {
  unsigned long oscode;  /* Win32 error */
  int errnocode; /* unix errno */
};

static struct errentry errtable[] = {
  {  ERROR_INVALID_FUNCTION,   EINVAL},  /* 1 */
  {  ERROR_FILE_NOT_FOUND, ENOENT},  /* 2 */
  {  ERROR_PATH_NOT_FOUND, ENOENT},  /* 3 */
  {  ERROR_TOO_MANY_OPEN_FILES,EMFILE},  /* 4 */
  {  ERROR_ACCESS_DENIED,  EACCES},  /* 5 */
  {  ERROR_INVALID_HANDLE, EBADF },  /* 6 */
  {  ERROR_ARENA_TRASHED,  ENOMEM},  /* 7 */
  {  ERROR_NOT_ENOUGH_MEMORY,  ENOMEM},  /* 8 */
  {  ERROR_INVALID_BLOCK,  ENOMEM},  /* 9 */
  {  ERROR_BAD_ENVIRONMENT,E2BIG },  /* 10 */
  {  ERROR_BAD_FORMAT, ENOEXEC   },  /* 11 */
  {  ERROR_INVALID_ACCESS, EINVAL},  /* 12 */
  {  ERROR_INVALID_DATA,   EINVAL},  /* 13 */
  {  ERROR_INVALID_DRIVE,  ENOENT},  /* 15 */
  {  ERROR_CURRENT_DIRECTORY,  EACCES},  /* 16 */
  {  ERROR_NOT_SAME_DEVICE,EXDEV },  /* 17 */
  {  ERROR_NO_MORE_FILES,  ENOENT},  /* 18 */
  {  ERROR_LOCK_VIOLATION, EACCES},  /* 33 */
  {  ERROR_BAD_NETPATH,ENOENT},  /* 53 */
  {  ERROR_NETWORK_ACCESS_DENIED,  EACCES},  /* 65 */
  {  ERROR_BAD_NET_NAME,   ENOENT},  /* 67 */
  {  ERROR_FILE_EXISTS,EEXIST},  /* 80 */
  {  ERROR_CANNOT_MAKE,EACCES},  /* 82 */
  {  ERROR_FAIL_I24,   EACCES},  /* 83 */
  {  ERROR_INVALID_PARAMETER,  EINVAL},  /* 87 */
  {  ERROR_NO_PROC_SLOTS,  EAGAIN},  /* 89 */
  {  ERROR_DRIVE_LOCKED,   EACCES},  /* 108 */
  {  ERROR_BROKEN_PIPE,EPIPE },  /* 109 */
  {  ERROR_DISK_FULL,  ENOSPC},  /* 112 */
  {  ERROR_INVALID_TARGET_HANDLE,  EBADF },  /* 114 */
  {  ERROR_INVALID_HANDLE, EINVAL},  /* 124 */
  {  ERROR_WAIT_NO_CHILDREN,   ECHILD},  /* 128 */
  {  ERROR_CHILD_NOT_COMPLETE, ECHILD},  /* 129 */
  {  ERROR_DIRECT_ACCESS_HANDLE,   EBADF },  /* 130 */
  {  ERROR_NEGATIVE_SEEK,  EINVAL},  /* 131 */
  {  ERROR_SEEK_ON_DEVICE, EACCES},  /* 132 */
  {  ERROR_DIR_NOT_EMPTY,  ENOTEMPTY },  /* 145 */
  {  ERROR_NOT_LOCKED, EACCES},  /* 158 */
  {  ERROR_BAD_PATHNAME,   ENOENT},  /* 161 */
  {  ERROR_MAX_THRDS_REACHED,  EAGAIN},  /* 164 */
  {  ERROR_LOCK_FAILED,EACCES},  /* 167 */
  {  ERROR_ALREADY_EXISTS, EEXIST},  /* 183 */
  {  ERROR_FILENAME_EXCED_RANGE,   ENOENT},  /* 206 */
  {  ERROR_NESTING_NOT_ALLOWED,EAGAIN},  /* 215 */
  {  ERROR_NOT_ENOUGH_QUOTA,   ENOMEM}/* 1816 */
};

/* The following two constants must be the minimum and maximum
   values in the (contiguous) range of Exec Failure errors. */
#define MIN_EXEC_ERROR ERROR_INVALID_STARTING_CODESEG
#define MAX_EXEC_ERROR ERROR_INFLOOP_IN_RELOC_CHAIN

/* These are the low and high value in the range of errors that are
   access violations */
#define MIN_EACCES_RANGE ERROR_WRITE_PROTECT
#define MAX_EACCES_RANGE ERROR_SHARING_BUFFER_EXCEEDED

void
mswindows_set_errno (unsigned long win32_error)
{
  int i;

  /* check the table for the OS error code */
  for (i = 0; i  countof (errtable); ++i)
{
  if (win32_error == errtable[i].oscode)
{
  errno = errtable[i].errnocode;
  return;
}
}

  /* The error code wasn't in the table.  We check for a range of
   * EACCES errors or exec failure errors (ENOEXEC).  Otherwise EINVAL is
   * returned. */
  if (win32_error = MIN_EACCES_RANGE  win32_error = MAX_EACCES_RANGE)
errno = EACCES;
  else if (win32_error = MIN_EXEC_ERROR  win32_error = MAX_EXEC_ERROR)
errno = ENOEXEC;
  else
errno = EINVAL;
}

void
mswindows_set_last_errno (void)
{
  mswindows_set_errno (GetLastError ());
}



Re: errno patches for Windows

2003-10-16 Thread Gisle Vanem
Hrvoje Niksic [EMAIL PROTECTED] said:

 OK.  So the whole thing with errno is only necessary when dealing with
 Winsock errors.  For errors from, say, fopen it's fine to use errno?

Yes.
 
 There is another possible approach.  We already #define read and write
 to call Winsock stuff.  We could add some more magic so that they and
 other Winsock invocations automatically set errno to last error value,
 translating Windows errors to errno errors. 

Then all Winsock functions must be wrapped in such macro.
E.g (untested):
#define SOCK_SELECT(fd,rd,wr,ex,tv)  ( \
int _rc = select (fd,rd,wr,ex,tv), \
(int)(WSAGetLastError() ? (errno = WSAGetLastError()) : (0)), \
_rc)

which could get messy; hard to return with a value from such a
macro.

 static struct errentry errtable[] = {
   {  ERROR_INVALID_FUNCTION,   EINVAL},  /* 1 */
   {  ERROR_FILE_NOT_FOUND, ENOENT},  /* 2 */

XEmacs is probably using native Win functions (e.g CreateFile
instead of fopen), so it needs to map them to Unix errnos. Wget only 
uses ANSI/Winsock functions, so only WS errors need attention.

Besides, on Windows there is no suiteable errno.h value for
e.g. ENOTCONN; we must use the winsock*.h value WSAENOTCONN.
So the XEmacs method wouldn't work.

--gv



Re: errno patches for Windows

2003-10-16 Thread Hrvoje Niksic
Gisle Vanem [EMAIL PROTECTED] writes:

 There is another possible approach.  We already #define read and write
 to call Winsock stuff.  We could add some more magic so that they and
 other Winsock invocations automatically set errno to last error value,
 translating Windows errors to errno errors. 

 Then all Winsock functions must be wrapped in such macro.
 E.g (untested):
 #define SOCK_SELECT(fd,rd,wr,ex,tv)  ( \
 int _rc = select (fd,rd,wr,ex,tv), \
 (int)(WSAGetLastError() ? (errno = WSAGetLastError()) : (0)), \
 _rc)

 which could get messy; hard to return with a value from such a
 macro.

How about:

#ifdef WINDOWS
# define select(a, b, c, d) windows_select (a, b, c, d)
#endif

windows_select can be a function defined in mswindows.c that calls
select, performs the necessary error-handling magic, and (easily)
returns a value.  BTW errno should only be modified if _rc0.

 static struct errentry errtable[] = {
   {  ERROR_INVALID_FUNCTION,   EINVAL},  /* 1 */
   {  ERROR_FILE_NOT_FOUND, ENOENT},  /* 2 */

 XEmacs is probably using native Win functions (e.g CreateFile
 instead of fopen), so it needs to map them to Unix errnos.

I believe it calls them in some places, and there it calls
mswindows_set_last_errno to fix up errno in case the callers inspect
it.  This is similar to the strategy I'd like to use in Wget.

 Wget only uses ANSI/Winsock functions, so only WS errors need
 attention.

OK.

 Besides, on Windows there is no suiteable errno.h value for
 e.g. ENOTCONN; we must use the winsock*.h value WSAENOTCONN.  So
 the XEmacs method wouldn't work.

Note that we could always add a version of strerror that supports
those.  For example:

/* mswindows.h */

enum { X_ERRBASE = 1, X_ENOTCONN = 10001, ... };

#ifndef ENOTCONN
# define ENOTCONN X_ENOTCONN
#endif
...

#define strerror(n) windows_strerror (n)

/* mswindows.c */

#undef strerror /* we want the real one here */

const char *
windows_strerror (int err)
{
  /* Leave the standard ones to strerror. */
  if (err  X_ERRBASE)
return strerror (err);

  /* Handle the unsupported ones manually. */
  switch (err)
{
  case X_ENOTCONN:
return Connection refused;
  ...
  default:
abort ();
}
  return NULL;
}

I know this looks like an unnecessary contortion at first, but Wget
*is* targeted to primarily support Unix-like OS'es, specifically GNU.
In this case I believe it makes sense to make the Windows-specific
code more complex to reduce the complexity in the code that assumes
Unix API's.



Re: errno patches for Windows

2003-10-16 Thread Gisle Vanem
Hrvoje Niksic [EMAIL PROTECTED] said:

 #ifdef WINDOWS
 # define select(a, b, c, d) windows_select (a, b, c, d)
 #endif

Okay by me.
 
 #ifndef ENOTCONN
 # define ENOTCONN X_ENOTCONN
 #endif

Except you cannot make Winsock return X_ENOTCONN.
It returns WSAENOTCONN (def'ed to ENOTCONN in
mswindows.h). Winsock errors are in the range
WSABASEERR (1) to 11031 with some holes in
the range.

 const char *
 windows_strerror (int err)
 {
   /* Leave the standard ones to strerror. */
   if (err  X_ERRBASE)
 return strerror (err);
 
   /* Handle the unsupported ones manually. */
   switch (err)
 {
   case X_ENOTCONN:
 return Connection refused;

Which AFAICS is the pretty much the same as in my patch.

Another thing is that Wget could mask errnos for Unix
too. In connect.c:

 ...
   {
 CLOSE (sock);
 sock = -1;
 goto out;
   }

out:
...
 else
   {
 save_errno = errno;
 if (!silent)
   logprintf (LOG_VERBOSE, failed: %s.\n, strerror (errno));
 errno = save_errno;
   }

The close() could possibly set errno too, but we want the errno 
from bind() or connect() don't we?

--gv