Re: Navigation Bar cannot be displayed after wgetting www.doxygen.nl

2024-01-11 Thread Gisle Vanem

Haowei Hsu wrote:


*wget --mirror --convert-links --adjust-extension --page-requisites
--no-parent https://www.doxygen.nl/ *

However, everything seems well except that the Navigation Bar cannot be
displayed.

[image: image.png]

What happened? Is this a bug of Wget? If so, how to fix this?


Trying this myself on Win-10, I get "Too many open files"
after 505 files were saved! As if Wget or Gnulib is not saving
a file correctly (?).


- OS version: Windows 11
- Wget version: 1.21.4


Same as my wget version.


--
--gv



Re: Issues on installation

2023-12-14 Thread Gisle Vanem

Noah Kpogo wrote:


Please I'm  trying to install nethunter using wget on termux, they only
keep saying invalid option. What should I do?


A typo. Here a,
  wget -O install-nethunder-termux htrps://offs.ec/2MceZW

gives "htrps://offs.ec/2MceZW: Unsupported scheme."

Fixing that, I get "ERROR 404: Not Found."

--
--gv



Re: Adding MPTCP option to wget

2023-06-05 Thread Gisle Vanem

Bastien Wiaux wrote:

diff --git a/src/connect.h b/src/connect.h
index d03a1708..e50fafe7 100644
--- a/src/connect.h
+++ b/src/connect.h
@@ -86,4 +86,38 @@ int select_fd_nb (int, double, int);
 #define select_fd_nb select_fd
 #endif

+#ifdef ENABLE_MPTCP
+#ifndef IPPROTO_MPTCP
+#define IPPROTO_MPTCP 262
+#endif
+#include 
+#ifndef SOL_MPTCP
+#define SOL_MPTCP 284
+#endif

---

This won't compile for non-Linux. Should perhaps be:

#if defined(ENABLE_MPTCP)
  #if defined(__linux__)
  #include 
  #else
  #error "Only Linux is supported for 'ENABLE_MPTCP'"
  #endif
#endif

/* make the rest compile with 'opt.mptcp == true'
 */
#ifndef IPPROTO_MPTCP
#define IPPROTO_MPTCP 262
#endif

#ifndef SOL_MPTCP
#define SOL_MPTCP 284
#endif


--
--gv



Re: wget fails with this url

2022-05-20 Thread Gisle Vanem

ge...@mweb.co.za wrote:


I tried this from South Africa and I am getting the exact behaviour as the OP.


Okay. Now I tested with various other older Wgets:
  the one bundled with Ruby (a msys2 built version) -> 403 Forbidden
  the one bundled with GNU Octave-6.4 -> 403 Forbidden
  Lumito's 'wget2.exe', same bleeding 403.

I forgot to state, I got no '403' with my home-built
Wget from git master (on Windows-10). So this is probably
a bug that's been fixed. So you could upgrade and try
again.




Re: wget fails with this url

2022-05-20 Thread Gisle Vanem

i...@mbsoft.biz wrote:


I'm trying to download an audio file from this url:

https://api.spreaker.com/v2/episodes/6645151/download.mp3


...


Resolving d1bxy2pveef3fq.cloudfront.net (d1bxy2pveef3fq.cloudfront.net)...
18.66.200.93, 18.66.200.230, 18.66.200.32, ...
Connecting to d1bxy2pveef3fq.cloudfront.net
(d1bxy2pveef3fq.cloudfront.net)|18.66.200.93|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2022-05-20 01:19:11 ERROR 403: Forbidden.


Works fine for me in Wget. But I'm in Norway and you seems
to be in Italy. So could be a licence/copyright issue.

--
--gv



Re: Wget1.21 fails to build on MinGW(Windows)

2021-01-06 Thread Gisle Vanem

G-Ey3dr wrote:


Wget1.21 fails to build on MinGW(Windows).
Is it okay to not use getrandom() roughly?


C:/msys32/mingw32/bin/../lib/gcc/i686-w64-mingw32/10.2.0/../../../../i686-w64-mingw32/bin/ld.exe:
../lib/libgnu.a(getrandom.o): in function `getrandom':
C:\home\user3\src\wget-1.21/lib\getrandom.c:129: undefined reference to
`BCryptGenRandom@16'
collect2.exe: error: ld returned 1 exit status


Why not simply link with 'libbcrypt.a'?
 --
--gv



Re: test wget_options_fuzzer seems never pass

2020-07-27 Thread Gisle Vanem

YX Hao wrote:


2 minor fixes for my previous commits:
Optimize deduping expression for version info
https://github.com/lifenjoiner/wget-for-windows/commit/8efc59dffc547345168239c0b9f70ba1ffcf6e0e


How does this make 'wget -V' and 'link_flags' look different?
I fail to see what that 'sed' expression does.
With my home-brewed GNU-makefile it shows:

Link:
link -nologo -map -debug -verbose -nodefaultlib:uuid.lib
-incremental:no -out:wget.exe MSVC_obj/build_info.obj
MSVC_obj/connect.obj MSVC_obj/convert.obj MSVC_obj/cookies.obj
MSVC_obj/css-url.obj MSVC_obj/css.obj MSVC_obj/exits.obj
MSVC_obj/ftp-basic.obj MSVC_obj/ftp-ls.obj MSVC_obj/ftp-opie.obj
MSVC_obj/ftp.obj MSVC_obj/hacks.obj MSVC_obj/hash.obj
MSVC_obj/host.obj MSVC_obj/hsts.obj MSVC_obj/html-parse.obj
MSVC_obj/html-url.obj MSVC_obj/http-ntlm.obj MSVC_obj/http.obj
MSVC_obj/init.obj MSVC_obj/iri.obj MSVC_obj/log.obj
MSVC_obj/main.obj MSVC_obj/mswindows.obj MSVC_obj/netrc.obj
MSVC_obj/openssl.obj MSVC_obj/progress.obj MSVC_obj/ptimer.obj
MSVC_obj/recur.obj MSVC_obj/res.obj MSVC_obj/retr.obj
MSVC_obj/spider.obj MSVC_obj/tbprogress.obj MSVC_obj/url.obj
MSVC_obj/utils.obj MSVC_obj/warc.obj MSVC_obj/win-ver.obj
MSVC_obj/xattr.obj MSVC_obj/wget.res
f:/MinGW32/src/inet/Crypto/OpenSSL/lib32/libssl.lib
f:/MinGW32/src/inet/Crypto/OpenSSL/lib32/libcrypto.lib
f:/MinGW32/src/inet/Crypto/OpenSSL/lib32/libcommon.lib
f:/MinGW32/src/inet/dns/C-ares/cares.lib advapi32.lib
f:/MinGW32/src/RegExp/libpcre/pcre.lib
f:/MinGW32/src/Compression/zlib-1.2.8/lib/x86/zlib.lib
f:/MinGW32/src/inet/IDN/libidn2/idn2.lib
f:/MinGW32/src/gnu/gnulib/lib/gnulib-vc.lib wsock_trace.lib
ole32.lib user32.lib

(an all-static wget...)


Remove '-lgdi32' from linking on Windows


Maybe  someone with an old static OpenSSL would still it:
  https://stackoverflow.com/questions/40656311/zlib-and-gdi32-with-openssl

--
--gv



Re: Wget for Windows unicode issue

2020-05-12 Thread Gisle Vanem

This works here:
  wget.exe --local-encoding=utf-8 -i url-file.txt

With a 'url-file.txt' containing:
  www.seoghør.no

I used 'https://cafewebmaster.com/online_tools/utf8_encode'
to UTF-8 encode '.seoghør.no' into the above.

Wget downloaded a 300 kB index.html. Except I had to
use '--no-check-certificate'!

--
--gv



Warning in http.c

2020-02-21 Thread Gisle Vanem

Hi list.

Just built Wget (master) using clang-cl and noticed this important
warning:
  http.c(874,11): warning:
  ordered comparison between pointer and integer ('size_t' (aka 'unsigned int') 
and 'char *')
   if (len < buf)
~~~ ^ ~~~

For the line:
  if (len < buf)
copy = buf;

(which should always be true).

Surely it should read:
  if (len < sizeof(buf))
copy = buf;


--
--gv



Re: [Bug-wget] Fwd: Fwd: Re: RESEND1: wget-1.20-win32

2019-05-18 Thread Gisle Vanem

WQ wrote:


*Also :*

https://saimei.ftp.acc.umu.se/mirror/ipfire.org/releases/ipfire-2.x/2.23-core131/ipfire-2.23.x86_64-full-core131.iso


I see no scroll-problem with this file. My console width is 140 character.
But I see the filename is truncated by 1 character; '.iso' -> '.is':

ipfire-2.23.x86_64-full-core131.is 100%[>] 256.00M 
9.58MB/sin 27s


2019-05-18 10:04:42 (9.42 MB/s) - 'ipfire-2.23.x86_64-full-core131.iso' saved 
[268435456/268435456]




Re: [Bug-wget] Error running command

2019-03-03 Thread Gisle Vanem

Siva Kumar wrote:


Hi please help me with this

when i run this command i get "Disabling SSL due to encountered error"

wget https://www.google.com


What OS, GnuTLS or OpenSSL and which wget version?
(wget -V'). Try 'wget -d https://www.google.com' for more details.

Looks like 'ssl_init()' returns false which could be
caused by many things.

--
--gv



Re: [Bug-wget] How to change SYSTEM_WGETRC?

2018-07-18 Thread Gisle Vanem

Jeffrey Walton wrote:


When I check my locally installed wget --version it is showing the wrong wgetrc:

 $ command -v wget
 /usr/local/bin/wget

 $ wget --version
 GNU Wget 1.19.5 built on linux-gnu.
 ...
 Wgetrc:
 /etc/wgetrc (system)

I installed an updated wgetrc based on sample.wgetrc in $PREFIX/etc
but it appears it is not being used.


Try setting an env-var 'WGETRC' pointing to your wget rc-file.
Works on Windows at least.

--
--gv



Re: [Bug-wget] TLS1.3 via GnuTLS

2018-07-16 Thread Gisle Vanem

Tim Rühsen wrote:


GnuTLS 3.6.3 has been released today with TLS1.3 support (latest draft).

So if you rebuild/link wget or wget2 with the new GnuTLS version, you
can enable TLS1.3 via --ciphers="NORMAL:+VERS-TLS1.3"  (wget) resp.
--gnutls-options="NORMAL:+VERS-TLS1.3" (wget2).


Not for me:
  wget.exe --ciphers="NORMAL:+VERS-TLS1.3" --secure-protocol=TLSv1_3  
https://www.google.com
  --2018-07-16 17:00:43--  https://www.google.com/
  Resolving www.google.com (www.google.com)... 216.58.207.196
  Connecting to www.google.com (www.google.com)|216.58.207.196|:443... 
connected.
  GnuTLS: Error in the push function.
  Unable to establish SSL connection.

Or worse, with:
  wget.exe --secure-protocol=TLSv1_3 https://www.google.com

I get an 'abort()' inside GnuTLS.

With Loganaden patch, it doesn't "crash" (i.e. aborts). But it
doesn't work just the same infamous "Error in the push function".

I'll stick to OpenSSL.

--
--gv



Re: [Bug-wget] [unit-test] Crash on test_hsts_new_entry()

2018-05-29 Thread Gisle Vanem

Tim Rühsen wrote:


I just merged another branch into master, this issue seemed to be fixed
in there. Please try again latest master.


Thanks, unit-test.exe works fine now.

BTW. Why 2 versions of these macros:
'mu_assert()' and 'mu_run_test()'?

Or even 2 files with exactly the content:
src/test.h + tests/unit-tests.h.


--
--gv



[Bug-wget] [unit-test] Crash on test_hsts_new_entry()

2018-05-22 Thread Gisle Vanem

I've built unit-test on Windows (clang-cl). But when running
it, it crashes after the message:
  RUNNING TEST test_hsts_new_entry...

Since 'opt.homedir' and therefore 'get_hsts_store_filename()'
returns NULL. How is 'opt.homedir' supposed to be set?

If I add:
  opt.homedir = home_dir();

to 'all_tests()', I do get the correct %HOME path (equals %APPDATA).
But it seems 'opt.homedir' gets cleared afterwards somewhere.
In test_cmd_spec_restrict_file_names() or test_path_simplify()?

So if I do this, all tests passes:

--- a/tests/unit-tests.c 2018-05-21 17:59:47
+++ b/tests/unit-tests.c 2018-05-22 15:00:19
@@ -43,11 +43,19 @@
 static const char *
 all_tests(void)
 {
+  opt.homedir = home_dir();
+
 #ifdef HAVE_METALINK
   mu_run_test (test_find_key_value);
   mu_run_test (test_find_key_values);
   mu_run_test (test_has_key);
 #endif
+#ifdef HAVE_HSTS
+  mu_run_test (test_hsts_new_entry);
+  mu_run_test (test_hsts_url_rewrite_superdomain);
+  mu_run_test (test_hsts_url_rewrite_congruent);
+  mu_run_test (test_hsts_read_database);
+#endif
   mu_run_test (test_parse_content_disposition);
   mu_run_test (test_parse_range_header);
   mu_run_test (test_subdir_p);
@@ -58,12 +66,6 @@
   mu_run_test (test_append_uri_pathel);
   mu_run_test (test_are_urls_equal);
   mu_run_test (test_is_robots_txt_url);
-#ifdef HAVE_HSTS
-  mu_run_test (test_hsts_new_entry);
-  mu_run_test (test_hsts_url_rewrite_superdomain);
-  mu_run_test (test_hsts_url_rewrite_congruent);
-  mu_run_test (test_hsts_read_database);
-#endif

--
--gv



Re: [Bug-wget] About GSoC project: Support QUIC Protocol

2018-03-08 Thread Gisle Vanem

Jay Bhavsar wrote:


But you should of course base any QUIC work for wget on a QUIC library.


So I'm browsing the libraries. And ngtcp2 seems interesting. Can we use it
in wget2? I'm also considering other options. Any suggestions on picking a
perticular library?


I agree on ngtcp2. Foremost because it seems to have good support
for MSVC/Windows. My next contender would be MozQuic. Written in C++,
but with C interface. A bit of a bummer for Wget2 or libcurl?

--
--gv



[Bug-wget] [Win32] fork_to_background() broken

2018-02-21 Thread Gisle Vanem

The recent change of prototype of 'fork_to_background()'
broke the Windows build:
  mswindows.c(329,1) :  error: conflicting types for 'fork_to_background'
  fork_to_background (void)
  ^
  ./utils.h(74,6) :  note: previous declaration is here
  bool fork_to_background (void);
   ^

A simplistic patch:

--- a/src/mswindows.c 2018-02-21 16:03:16
+++ b/mswindows.c 2018-02-21 16:24:06

@@ -312,7 +312,7 @@

 /* This is the corresponding Windows implementation of the
fork_to_background() function in utils.c.  */
-void
+bool
 fork_to_background (void)
 {
   int rv;
@@ -332,6 +345,18 @@
   abort ();
 }
   /* If we get here, we're the child.  */
+  return false;
 }

(ignoring the 'logfile_changed' stuff).

--
--gv



Re: [Bug-wget] Misuse of idn2_free()

2017-04-08 Thread Gisle Vanem

A bit simpler patch:

--- a/url.c 2017-04-08 11:24:21
+++ b/url.c 2017-04-08 12:01:07
@@ -943,7 +943,8 @@
   if (new)
 {
   xfree (u->host);
-  u->host = new;
+  u->host = xstrdup (new);
+  idn2_free (new);
   host_modified = true;
 }
 }

--
--gv



Re: [Bug-wget] Misuse of idn2_free()

2017-04-08 Thread Gisle Vanem

Tim Rühsen wrote:


Thanks, Gisle.

pushed with several additional fixes/cleanups regarding idn2.


Too much cleanups I guess since it's crashing because no
'idn2_free()' called when needed. This works here:

--- a/url.c 2017-04-08 11:24:21
+++ b/url.c 2017-04-08 11:38:37
@@ -944,6 +944,7 @@
 {
   xfree (u->host);
   u->host = new;
+  u->idn_allocated = true;
   host_modified = true;
 }
 }
@@ -1222,6 +1223,9 @@
 {
   if (url)
 {
+  if (url->idn_allocated)
+idn2_free (url->host);
+  else
   xfree (url->host);

   xfree (url->path);

--- a/url.h 2017-04-08 11:24:21
+++ b/url.h 2017-04-08 11:35:26
@@ -84,6 +84,7 @@
   enum url_scheme scheme;   /* URL scheme */

   char *host;   /* Extracted hostname */
+  bool idn_allocated;   /* 'host' allocated by libidn2 */
   int port; /* Port number */

---

--
--gv



[Bug-wget] Misuse of idn2_free()

2017-04-08 Thread Gisle Vanem

The 'idn_decode()' function now simply uses 'xstrdup()'.
And in host.c + connect.c there are calls to 'idn2_free()'
on this pointer:
 if (opt.enable_iri && (name = idn_decode ((char *) print)) != NULL)
  {
   int len = strlen (print) + strlen (name) + 4;
   str = xmalloc (len);
   snprintf (str, len, "%s (%s)", name, print);
   str[len-1] = '\0';
   idn2_free (name);   << !
  }

Since the above 'name' is NOT from libidn, I get a crash when
mixing a MinGW built libidn2.dll with a MSVC built Wget.exe.

I think someone forgot to change the above code when 'idn_decode()'
got simplified. This patch works for me:

--- a/connect.c 2017-01-19 21:37:55
+++ b/connect.c 2017-04-08 09:57:24
@@ -284,7 +284,7 @@
   str = xmalloc (len);
   snprintf (str, len, "%s (%s)", name, print);
   str[len-1] = '\0';
-  idn2_free (name);
+  xfree (name);
 }

   logprintf (LOG_VERBOSE, _("Connecting to %s|%s|:%d... "),

--- a/host.c 2017-01-19 21:37:55
+++ b/host.c 2017-04-08 10:02:42
@@ -850,7 +850,7 @@
   str = xmalloc (len);
   snprintf (str, len, "%s (%s)", name, host);
   str[len-1] = '\0';
-  idn2_free (name);
+  xfree (name);
 }

   logprintf (LOG_VERBOSE, _("Resolving %s... "),



--gv



Re: [Bug-wget] inet_ntop() in mswindows.c

2017-03-08 Thread Gisle Vanem
Tim Rühsen wrote:

> In src/hosts.c put it after
> #include 
> 
> And also remove the
> # ifndef __BEOS__
> ...
> part.
> 
> I guess we should do some more cleanups there.

For sure. IMHO the best place would be in mswindows.h.
(thus only 1 place). But I've so many private changes to mswindows.*
that it's hard for me to create a patch. But something like:

--- a/src/mswindows.h 2016-05-14 17:43:51
+++ b/src/mswindows.h 2017-03-08 10:43:33
@@ -57,6 +57,9 @@
 /* Declares getpid(). */
 #include 

+/* Declares inet_ntop() and inet_pton(). */
+#include 
+
 /* We have strcasecmp and strncasecmp, just under different names.  */
 #ifndef HAVE_STRCASECMP
 # define strcasecmp stricmp
@@ -85,12 +88,8 @@

 #define PATH_SEPARATOR '\\'

-/* Additional declarations needed for IPv6: */
-#ifdef ENABLE_IPV6
-const char *inet_ntop (int, const void *, char *, socklen_t);
-#endif
-
-/* ioctl needed by set_windows_fd_as_blocking_socket() */
+/* We need to include this header since Gnulib's unlink() could be defined as
+   rpl_unlink(). And because 'struct options' has no 'rpl_unlink' member. */
 #include 

 /* Public functions.  */



(also explains why  is needed).

-- 
--gv



[Bug-wget] inet_ntop() in mswindows.c

2017-03-06 Thread Gisle Vanem
Just a detail, but in src/mswindows.c, there is:

  #ifdef ENABLE_IPV6
  /* An inet_ntop implementation that uses WSAAddressToString.
 Prototype complies with POSIX 1003.1-2004.  This is only used under
 IPv6 because Wget prints IPv4 addresses using inet_ntoa.  */

This is wrong since 1) inet_ntoa() is no longer used. And 2) since
inet_ntop() is used for IPv4 too, 'ENABLE_IPV6' should then become
'!defined(HAVE_INET_NTOP)'. Thus:

@@ -572,10 +572,10 @@
 }


-#ifdef ENABLE_IPV6
+#if !defined(HAVE_INET_NTOP)
 /* An inet_ntop implementation that uses WSAAddressToString.
-   Prototype complies with POSIX 1003.1-2004.  This is only used under
-   IPv6 because Wget prints IPv4 addresses using inet_ntoa.  */
+   Prototype complies with POSIX 1003.1-2004.  This is used
+   for both IPv4 and IPv6.  */

 const char *
 inet_ntop (int af, const void *src, char *dst, socklen_t cnt)

--gv



Re: [Bug-wget] [bug #50223] wget 1.19 will not build on MacOS 10.12.3

2017-02-03 Thread Gisle Vanem
Charles wrote:

> The `make` step fails with this:
> 
> --8<
> /Applications/Xcode.app/Contents/Developer/usr/bin/make  all-am
>   CC   connect.o
>   CC   convert.o
>   CC   cookies.o
>   CC   ftp.o
> ftp.c:1466:19: error: no member named 'rpl_unlink' in 'struct options'
>   if (opt.unlink && file_exists_p (con->target))
>   ~~~ ^
> ../lib/unistd.h:1851:19: note: expanded from macro 'unlink'
> #   define unlink rpl_unlink

Looks like the same issue I had with mswindows.h.
The fix was to put '#include ' before
Gnulib's unlink() got a chance to get defined as
rpl_unlink().

-- 
--gv



Re: [Bug-wget] strerror() on Win32

2016-10-14 Thread Gisle Vanem
Eli Zaretskii wrote:

> My guess is that for some reason Wget calls the MS-Windows strerror,
> not its Gnulib replacement.  But that's a guess, and I don't know how
> to explain it.  Perhaps put a breakpoint both at the Gnulib strerror
> and the MS runtime one, and see what happens in your scenario.

Okay, I'll look into it; putting some trace in GnuLib's strerror() etc.

> Failing that, if you can show a recipe for reproducing this, including
> a URL to use, I could see what happens on my system, and maybe we will
> see the light.

I didn't find a testing-host/URL that would result in ETIMEDOUT. Must be
some null-host out there somewhere. The original problem with my
ISP's ftp-service vanished.

-- 
--gv



[Bug-wget] strerror() on Win32

2016-10-13 Thread Gisle Vanem
I think I've mentioned earlier; the troubles with strerror()
returning "Unknown error" for seemingly common 'errno' values.

I hit me today, when connection to my ftp-hosting service. From
the Wsock-trace [1] of connect():

  * 49.163 sec: f:/MingW32/src/gnu/gnulib/lib/connect.c(43) (rpl_connect+64):
connect (620, 46.30.213.77:21, fam AF_INET) --> WSAETIMEDOUT (10060).

failed: Unknown error.

I put some trace-code in Wget's connect.c and do see 'errno' is 138.
Which is ETIMEDOUT as defined by Gnulib's . But I fail to
understand why Gnulib's strerror(138) is incapable of handling it.

Looking at Gnulib's strerror-override.c, I see it should return
"Connection timed out" there. But it doesn't. Any pointers?

I'm on Win-10 using MSVC-2015.

[1]: https://github.com/gvanem/wsock-trace

-- 
--gv



Re: [Bug-wget] [PATCH] Trivial changes in HSTS

2016-06-18 Thread Gisle Vanem
Eli Zaretskii wrote:

> IMO, this test should be bypassed on Windows.  The "world" part in
> "world-writeable" is a Unix-centric notion, and its translation into
> MS-Windows ACLs is non-trivial (read: "impossible").  (For example,
> your "non-world-writeable" file is accessible to certain users and
> groups of users on Windows, other than Administrator.)  So the sanest
> solution for this is simply not to make this test on Windows.

Makes sense. I agree.

-- 
--gv



Re: [Bug-wget] [PATCH] Trivial changes in HSTS

2016-06-17 Thread Gisle Vanem
> +static bool
> +hsts_file_access_valid (const char *filename)
> +{
> +  struct_stat st;
> +
> +  if (stat (filename, ) == -1)
> +return false;
> +
> +  return !(st.st_mode & S_IWOTH) && S_ISREG (st.st_mode);

Due to the above patch, the following output on Wget/Windows seems
a bit paranoid; wget -d https://vortex.data.microsoft.com/collect/v1
  ...
  Reading HSTS entries from c:\Users\Gisle\AppData\Roaming/.wget-hsts
  Will not apply HSTS. The HSTS database must be a regular and 
non-world-writable file.
  ERROR: could not open HSTS store at 
'c:\Users\Gisle\AppData\Roaming/.wget-hsts'. HSTS will be disabled.

On Windows this file is *not* "world-writeable" AFAICS (and yes, it does 
exists).
Hence this "paranoia" should be accounted for. I'm not so much into Posix,
so I'll leave it to you experts to comment & patch.

-- 
--gv



Re: [Bug-wget] Progress bar on MS-Windows

2016-06-07 Thread Gisle Vanem
Darshit Shah wrote:

> And for future reference, you could use rate limiting to simulate a slow
> connection and get the required large eta text.

As Eli wrote, I used to have this off-by-1 error too. But with
the Git master, it's no longer the case.

But the '--limit-rate' option (I forgot about that completely), the
progress-bar seems a bit different as compared to d/l at full speed.

Compare the attached image wget-progress-1.png:
  wget --show-progress --quiet -np -r www.watt-32.net/watt-doc/

VS wget-progress-2.png:
  wget --show-progress --quiet --limit-rate=2k -np -r www.watt-32.net/watt-doc/

I think it's a bit strange the final d/l speed isn't "sticky" in both cases.
Is it because the speed is too high?

I'm on Win-10, Wget/MSVC-2015. Captures by Greenshot:
 https://sourceforge.net/projects/greenshot.

-- 
--gv





Re: [Bug-wget] HAVE_CARES on Windows

2016-04-11 Thread Gisle Vanem
Tim Rühsen wrote:

> As Eli, I would like to know a few more details.
> Is it possible to make c-ares return the 'native' socket numbers to not get 
> in 
> conflict with gnulib ?

As Eli pointed out, it's vice-versa; C-ares *do* return 'native'
socket numbers. While Gnulib's socket(), select() etc. creates and
expects 'file descriptors'. Normally in the range >= 3 (?). (I assume
this has something to POSIX compliance. Winsock's socket() never returns
such low numbers).

Eli> However, converting a handle into a
Eli> file descriptor and vise versa involves using 2 simple functions,

I'm not sure what those functions are since I'm not so much into Gnulib.

My intuition told me the 'rpl_select()' was the cause for the resolve-
failure, hence this 'undef'. And since the host.c 'select()' is used only for
'HAVE_LIBCARES' code, I felt it won't hurt do '#undef select' in host.c.

But I'm open to alternatives. Eli, can you try building with
'HAVE_LIBCARES'?

> It was not my intention to replace all 'the good old' methods, as long as 
> they 
> work. C-ares is just used for functionality that libc (and/or libresolv) does 
> not provide. Wget is still an official GNU tool and that implies that we try 
> to use as much GNU software/libraries/code as possible. 

Okay. But I feel that C-ares should *not* need to be told what DNS-server(s)
to use. It is smart enough to figure out that for it self. Hence I feel it's 
strange
and a bit confusing (for new users of Wget) that only a '--dns-servers' option 
will
force the use of C-ares. IMHO it should be default; from a 'wget -V', it says
'+cares'. Or am I still missing something?

BTW. without my patch, I get this link failure:
  host.obj : error LNK2019: unresolved external symbol
  _select_used_without_including_sys_select_h referenced in function _wait_ares

  Another Gnulib idiosyncrasy; #include  must be included for
  'WINDOWS' too if one calls 'select()'.

-- 
--gv



[Bug-wget] HAVE_CARES on Windows

2016-04-09 Thread Gisle Vanem
I have tried building latest Wget with '-DHAVE_LIBCARES'
and all resolve attempts failed due to Gnulib's select()
is not compatible with the socket-number(s) returned from
a normal C-ares library on Windows.

This is what I did to fix it:

--- a/host.c 2016-04-09 17:45:44
+++ b/host.c 2016-04-09 21:48:06
@@ -694,6 +694,13 @@
   return al;
 }

+/* Since GnuLib's select() (i.e. rpl_select()) cannot handle socket-numbers
+ * returned from C-ares, we must use the original select() from Winsock.
+ */
+#ifdef WINDOWS
+#undef select
+#endif
+
 static void
 wait_ares (ares_channel channel)
 {

---

So with a command like 'wget --dns-servers=8.8.8.8 www.vg.no'
all is well.

But it seems strange to me that w/o the '--dns-servers' option
it falls back to good old 'gethostbyname_with_timeout_callback()'
method. Shouldn't the use of C-ares's wait_ares() be default w/o
this option? I must be missing something.

-- 
--gv



Re: [Bug-wget] Win32, assert() in progress.c

2016-03-05 Thread Gisle Vanem
Darshit Shah wrote:

> However, one a slight side note, we do need better CI tests for Windows.  
> @Gisle, would it be possible for you to write
> a build rule for Wget on AppVeyor? I've tried, but lacking a Windows machine, 
> writing those build scripts is pretty
> hard. AppVeyor provides a free Windows CI testing service.

Do you have any docs on AppVeyor+MSVC in general and AppVeyor+Wget
in particular? I've never played with CI-systems before (even with
my own stuff at github).

-- 
--gv



[Bug-wget] Win32, assert() in progress.c

2016-03-05 Thread Gisle Vanem
Darshit (?), the progress.c change to add the assert() at line 1167:
  assert (padding > 0 && "Padding length became non-positive!");

always triggers on MSVC-2015 and TDM-gcc (w/o -DNDEBUG).

Not sure it's because of 'determine_screen_width()',
'USE_NLS_PROGRESS_BAR=1' or something else, but a:
  assert (padding >= 0 && "Padding length became non-positive!");

fixes it. Some off-by-1 calculation?

BTW,
here is what it looks like on Win-10 (120 x 50 screen):
  http://watt-32.net/misc/wget-progress-1.png   (default --progress=bar)
  http://watt-32.net/misc/wget-progress-2.png   (-q --show-progress)

with a modified patch of Jernej Simončič Windows TaskBar feature:
  https://eternallybored.org/misc/wget/src/taskbar-progress.patch

-- 
--gv



Re: [Bug-wget] Writing to a Read-Only directory

2016-02-10 Thread Gisle Vanem
Tim Ruehsen wrote:

> I fixed that issue in a tiny commit that I just pushed.
> 
> BTW, I remember we had this or a similar issue before... though I couldn't 
> find it with a quick search.

Sorry Tim, the error-message is still the same.
There are several return-paths in logprintf() where 'errno_saved'
isn't restored. This is what I did to prevent losing 'errno':

--- a/log.c 2016-02-10 18:09:07
+++ b/log.c 2016-02-10 18:53:25
@@ -277,21 +277,21 @@
 {   \
 case LOG_PROGRESS:  \
   if (!opt.show_progress)   \
-return; \
+goto quit;  \
   break;\
 case LOG_ALWAYS:\
   break;\
 case LOG_NOTQUIET:  \
   if (opt.quiet)\
-return; \
+goto quit;  \
   break;\
 case LOG_NONVERBOSE:\
   if (opt.verbose || opt.quiet) \
-return; \
+goto quit;  \
   break;\
 case LOG_VERBOSE:   \
   if (!opt.verbose) \
-return; \
+goto quit;  \
 }

 /* Returns the file descriptor for logging.  This is LOGFP, except if
@@ -351,6 +351,7 @@
 {
   FILE *fp;
   FILE *warcfp;
+  int errno_saved = errno;

   check_redirect_output ();
   if (o == LOG_PROGRESS)
@@ -359,7 +360,7 @@
 fp = get_log_fp ();

   if (fp == NULL)
-return;
+goto quit;

   warcfp = get_warc_log_fp ();
   CHECK_VERBOSE (o);
@@ -373,6 +374,9 @@
 logflush ();
   else
 needs_flushing = true;
+
+quit:
+  errno = errno_saved;
 }

 struct logvprintf_state {
@@ -547,7 +551,8 @@

   check_redirect_output ();
   if (inhibit_logging)
-return;
+goto quit;
+
   CHECK_VERBOSE (o);

   xzero (lpstate);
@@ -563,6 +568,7 @@
 }
   while (!done);

+quit:
   errno = errno_saved;
 }



The question is if errno caused by logprintf() gets lost and
caused havoc elsewhere!?



Re: [Bug-wget] Marking Release v1.17.1?

2015-12-16 Thread Gisle Vanem
Eli Zaretskii wrote:

> +  {
> +#ifdef WIN32
> + /* If the connection timed out, fd_close will hang in Gnulib's
> +close_fd_maybe_socket, inside the call to WSAEnumNetworkEvents.  */
> + if (errno != ETIMEDOUT)
> +#endif
> +   fd_close (sock);
> +  }
>  if (print)
>logprintf (LOG_NOTQUIET, _("failed: %s.\n"), strerror (errno));
>  errno = save_errno;

I assume fd_close() could cause 'errno' to be set again (for some
strange reason?). So shouldn't 'save_errno' be printed instead?

  if (print)
logprintf (LOG_NOTQUIET, _("failed: %s.\n"), strerror (save_errno));

Or a swap:
   errno = save_errno;
   if (print)
  logprintf (LOG_NOTQUIET, _("failed: %s.\n"), strerror (errno));


--gv



Re: [Bug-wget] Marking Release v1.17.1?

2015-12-12 Thread Gisle Vanem
Jernej Simončič wrote:

> Here's another one that I thought was already fixed, but apparently
> wasn't - --connect-timeout doesn't work on Windows without this patch

You're right. This is needed:

--- src/connect.c~0 2014-12-02 09:49:37.0 +0200
+++ src/connect.c   2015-03-17 17:14:48.414375000 +0200
@@ -364,7 +364,12 @@ connect_to_ip (const ip_address *ip, int
logprintf.  */
 int save_errno = errno;
 if (sock >= 0)
-  fd_close (sock);
+  {
+#ifdef WIN32
+   if (errno != ETIMEDOUT)
+#endif
+ fd_close (sock);
+  }


But I don't really understand why. Care to explain?

A simple test-case here on Win-10:

  timer & wget --connect-timeout=10 --tries=3 http://10.0.0.22:21 

  --2015-12-12 12:42:23--  http://10.0.0.22:21/
  Connecting to 10.0.0.22:21... failed: Unknown error.
  Retrying.

  --2015-12-12 12:42:44--  (try: 2)  http://10.0.0.22:21/
  Connecting to 10.0.0.22:21... failed: Unknown error.
  Retrying.

  --2015-12-12 12:43:06--  (try: 3)  http://10.0.0.22:21/
  Connecting to 10.0.0.22:21... failed: Unknown error.
  Giving up.

  Timer 1 off: 13.53.40  Elapsed: 0.00.33,08

Without you patch, that command never finishes.

The message wrongly says "Unknown error", but that is another matter...

-- 
--gv



Re: [Bug-wget] Fixing Test-k for Cygwin (and hopefully Windows)

2015-12-11 Thread Gisle Vanem
Tim Ruehsen wrote:

> That means the perl test suite is broken on Windows - thus no test suite at 
> all on Windows !? Perl is not able to fork on Windows ?

An option could be to use Perlfork:
  http://perldoc.perl.org/perlfork.html

But I'm no Perl expert.

-- 
--gv



Re: [Bug-wget] Windows cert store support

2015-12-10 Thread Gisle Vanem
Random Coder wrote:

> I'm not sure if the wget maintainers would be interested, but I've
> been carrying this patch around in my private builds of wget for a
> while.  It allows wget to load SSL certs from the default Windows cert
> store.

I've applied your patch. It seems to work fine. Nice!

But in a message like:
  X509 certificate successfully verified and matches host
  www.ssllabs.com

it would be nice to know if it succeeded because of WinCrypt or
OpenSSL.

> +  /* Loop through all the certs in the Windows cert store */
> +  for ( pCertCtx = Local_CertEnumCertificatesInStore(hStore, NULL);
> +  pCertCtx != NULL;
> +  pCertCtx = Local_CertEnumCertificatesInStore(hStore, pCertCtx) )
> +  {
> +if (!((pCertCtx->dwCertEncodingType & PKCS_7_ASN_ENCODING) == 
> PKCS_7_ASN_ENCODING))
> +{
> +  /* Add all certs we find to OpenSSL's store */

How does this prevent an expired Cert to be used?
I see in the 'CERT_INFO' structure a 'NotAfter' member. But this
struct seems to support for WINAPI_PARTITION_APP only :-(
I assume this could be used to check expired certificates.

-- 
--gv



Re: [Bug-wget] Wget 1.17 doesn't compile on Windows (hsts.c)

2015-11-17 Thread Gisle Vanem
Tim Ruehsen wrote:

> BTW, I am pretty astonished that there are no Windows developers ever trying 
> to compile Wget before any release. How can we any longer support an OS 
> without any help from OS users ?

I build Wget on Windows all the time w/o any problem with flock().
(LOCK_EX is defined in Gnulib's , so what gives?)

BTW. Why are there no DOS developers complaining on this. The Wget
  sources still have references to 'MSDOS'. I'm not even sure
  Gnulib on djgpp is possible any longer.

-- 
--gv



Re: [Bug-wget] [Bulk] 1.16, w64: filename marquee one character too short; overall dipslay line one character too wide

2015-10-19 Thread Gisle Vanem

"AJ" 


2.) Displaying "eta" uses too much space when both minutes and seconds
have 2-digit numbers. Thus, the entire line needs 80 characters and
forces a linewrap, which turns out into NOT updating the line per
progress step, but instead prints new lines all over.


I believe this is due to the diffence in behaviour of bash etc. and
WinCon on handling a character at rightmost edge. Windows
will wrap the cursor and go to the next line. Bash won't act until it
sees the next character AFAICR.

You could try this patch:

--- a/utils.c  2015-10-19 20:20:20 +
+++ b/utils.c  2015-10-19 20:26:34 +

@@ -1827,7 +1827,7 @@
  CONSOLE_SCREEN_BUFFER_INFO csbi;
  if (!GetConsoleScreenBufferInfo (GetStdHandle (STD_ERROR_HANDLE), ))
return 0;
-  return csbi.dwSize.X;
+  return csbi.dwSize.X - 1;
#else  /* neither TIOCGWINSZ nor WINDOWS */
  return 0;

I believe a 'set LINES=79' won't work on Windows  (?)

--gv



Re: [Bug-wget] Feature: Disabling progress bar when wget is backgrounded

2015-09-23 Thread Gisle Vanem

Ángel González wrote:


I have adapted the above patch to C. See attachment.


What compiler did you use? I failed to compile your version
using MSVC v19.


Warning: I haven't tested it other than verifying that it compiles.


I've modified tbprogress.c to compile with both C and C++ using
MSVC v19. TDM-gcc 5.1 remains to be tested. The old-school MingW
(mingw.org) doesn't seems to have an up-to-date . So
I've given up on that. tbprogress.c is attached. It seems to work
fine.

Contrary to the Windows Console title progress in background-mode,
the ITaskbarList3 works fine (as it's not related to the connhost
in any way). The "title progress" stops AFAICS, since 'FreeConsole()'
is called in ws_hangup().

--
--gv

/*
 * Adapted from:
 *   https://eternallybored.org/misc/wget/src/taskbar-progress.patch
 */

#if !defined(__cplusplus)
  #define CINTERFACE
  #define COBJMACROS
#endif

#include "config.h"

#ifndef _WIN32_WINNT
#define _WIN32_WINNT 0x0500
#endif

#include 
#include 
#include 
#include 

#include "tbprogress.h"

#if !defined(_SHLOBJIDL_H) && !defined(__ITaskbarList3_INTERFACE_DEFINED__)
#error This file is not for you. Set ENABLE_TASKBAR=0.
#endif

const GUID CLSID_TaskbarList = { 0x56FDF344, 0xFD6D, 0x11d0, { 
0x95,0x8A,0x00,0x60,0x97,0xC9,0xA0,0x90 } };
const GUID IID_ITaskbarList1 = { 0x56FDF342, 0xFD6D, 0x11d0, { 
0x95,0x8A,0x00,0x60,0x97,0xC9,0xA0,0x90 } };
const GUID IID_ITaskbarList3 = { 0xea1afb91, 0x9e28, 0x4b86, { 
0x90,0xe9,0x9e,0x9f,0x8a,0x5e,0xef,0xaf } };

#if !defined(__ITaskbarList3_INTERFACE_DEFINED__)
  typedef enum {
TBPF_NOPROGRESS= 0, /* Normal state / no progress bar */
TBPF_INDETERMINATE = 1, /* Marquee style progress bar */
TBPF_NORMAL= 2, /* Standard progress bar */
TBPF_ERROR = 4, /* Red taskbar button to indicate an error occurred 
*/
TBPF_PAUSED= 8  /* Yellow taskbar button to indicate user attention 
*/
  } TBPFLAG;

  typedef void* LPTHUMBBUTTON;  /* dummy typedef! */
  typedef enum { TBATF_DUMMY } TBATFLAG;

  #undef  INTERFACE
  #define INTERFACE ITaskbarList3
  DECLARE_INTERFACE_(ITaskbarList3,IUnknown)
  {
STDMETHOD(QueryInterface)(THIS_ REFIID,PVOID*) PURE;
STDMETHOD_(ULONG,AddRef)(THIS) PURE;
STDMETHOD_(ULONG,Release)(THIS) PURE;

/* ITaskbarList(1) */
STDMETHOD(HrInit)(THIS) PURE;
STDMETHOD(AddTab)(THIS, HWND hwnd) PURE;
STDMETHOD(DeleteTab)(THIS, HWND hwnd) PURE;
STDMETHOD(ActivateTab)(THIS, HWND hwnd) PURE;
STDMETHOD(SetActiveAlt)(THIS, HWND hwnd) PURE;

/* ITaskbarList2 */
STDMETHOD(MarkFullscreenWindow)(THIS, HWND hwnd, BOOL fFullscreen) PURE;

/* ITaskbarList3 */
STDMETHOD(SetProgressValue)(THIS, HWND hwnd, ULONGLONG ullCompleted, 
ULONGLONG ullTotal) PURE;
STDMETHOD(SetProgressState)(THIS, HWND hwnd, TBPFLAG tbpFlags) PURE;
STDMETHOD(RegisterTab)(THIS, HWND hwndTab, HWND hwndMDI) PURE;
STDMETHOD(UnregisterTab)(THIS, HWND hwndTab) PURE;
STDMETHOD(SetTabOrder)(THIS, HWND hwndTab,HWND hwndInsertBefore) PURE;
STDMETHOD(SetTabActive)(THIS, HWND hwndTab,HWND hwndMDI, TBATFLAG 
tbatFlags) PURE;
STDMETHOD(ThumbBarAddButtons)(THIS, HWND hwnd,UINT cButtons, LPTHUMBBUTTON 
pButton) PURE;
STDMETHOD(ThumbBarUpdateButtons)(THIS, HWND hwnd,UINT cButtons, 
LPTHUMBBUTTON pButton) PURE;
STDMETHOD(ThumbBarSetImageList)(THIS, HWND hwnd, HIMAGELIST himl) PURE;
STDMETHOD(SetOverlayIcon)(THIS, HWND hwnd, HICON hIcon, LPCWSTR 
pszDescription) PURE;
STDMETHOD(SetThumbnailTooltip)(THIS, HWND hwnd, LPCWSTR pszTip);
STDMETHOD(SetThumbnailClip)(THIS, HWND hwnd, RECT *prcClip);

STDMETHOD(QueryContextMenu)(THIS_ HMENU,UINT,UINT,UINT,UINT) PURE;
STDMETHOD(InvokeCommand)(THIS_ LPCMINVOKECOMMANDINFO) PURE;
STDMETHOD(GetCommandString)(THIS_ UINT,UINT,PUINT,LPSTR,UINT) PURE;
STDMETHOD(HandleMenuMsg)(THIS_ UINT,WPARAM,LPARAM) PURE;
  };
  #undef INTERFACE
#endif  /* __ITaskbarList3_INTERFACE_DEFINED__ */

static ITaskbarList3 *g_pTL = NULL;
static HWND   g_hwndConsole = NULL;
static intTB_status = 0;

/* Use these macros to gets to the methods.
 */
#ifdef __cplusplus
  #define COCREATEINSTANCE(cls,iunk,ctx,iid,pv)  CoCreateInstance 
(cls,iunk,gtx,iid,pv)
  #define HRINIT(iface)  iface->HrInit()
  #define RELEASE(iface) iface->Release()
  #define SETPROGRESSVALUE(iface,hwnd,permille,max)  iface->SetProgressValue 
(hwnd, permille, max)
  #define SETPROGRESSSTATE(iface,hwnd,state) iface->SetProgressState 
(hwnd, state)
#else
  #define COCREATEINSTANCE(cls,iunk,ctx,iid,pv)  CoCreateInstance 
((REFCLSID)&(cls),iunk,ctx,(REFCLSID)&(iid),pv)
  #define HRINIT(iface)  ITaskbarList3_HrInit 
(iface)
  #define RELEASE(iface) ITaskbarList3_Release 
(iface)
  #define SETPROGRESSVALUE(iface,hwnd,permille,max)  
ITaskbarList3_SetProgressValue (iface,hwnd,permille,max)
  #define 

Re: [Bug-wget] Feature: Disabling progress bar when wget is backgrounded

2015-09-22 Thread Gisle Vanem

Tim Ruehsen wrote:


If I background wget, I definitely do not want to see a progress bar any more,
no matter if --show-progress bar is on or not (e.g. when the switch in ON via
wgetrc). Maybe it's a matter of taste - but I can't see any use in the
combination 'background' + 'progress=on'.


You're talking about progress inside the terminal. I think
it would be nice to have a progress "bar" on the Windows
console-title even in background-mode. Ref. ws_percenttitle().
That doesn't work now.

--
--gv



[Bug-wget] recur.c compile error

2015-08-14 Thread Gisle Vanem

The new reject stuff in recur.c:
  typedef enum
  {
SUCCESS, BLACKLIST, NOTHTTPS, NONHTTP, 1, 1, PARENT, LIST, REGEX,
RULES, SPANNEDHOST, ROBOTS
  } reject_reason;

causes errors with MSVC and MingW since in:
  math.h:952:#define DOMAIN  _DOMAIN i.e. 1
  wingdi.h:1893: #define ABSOLUTE1

math.h is pulled in via some Gnulib headers. And wingdi.h via
windows.h. I suggest this simple fix:

--- a/src/recur.c   2015-08-14 21:45:44
+++ b/src/recur.c   2015-08-14 21:54:45
@@ -182,6 +182,9 @@
   return ret;
 }

+#undef ABSOLUTE
+#undef DOMAIN
+
 typedef enum
 {
   SUCCESS, BLACKLIST, NOTHTTPS, NONHTTP, ABSOLUTE, DOMAIN, PARENT, LIST, REGEX,

Or better names for the enumerations; 'RR_xx' ?

--
--gv



Re: [Bug-wget] Images from wordpress sites

2015-08-05 Thread Gisle Vanem

RJ Davis wrote:


Looks like a recent wordpress update, version 4.2.4, might of broken wget's
ability to pull images from such sites.

In the past I have used this command
wget -E -H -k -K -t 2 -p
http://thechive.com/2015/08/04/how-to-make-those-signature-drinks-from-some-of-your-favorite-films-11-photos/

and the files would be stored here
\thechive.files.wordpress.com\2015\08


Works fine here. You mean e.g. D/L .png-files into this dir:
 .\s0.wp.com\wp-content\mu-plugins\smileyproject\default\ie\

doesn't work? I just did it:
  http://i.imgur.com/2DhxuwV.png

PS.
  The above thumbnail-view made by IrfanView:
   http://www.irfanview.com/

  and screen-shot made by Greenshot:
   https://sourceforge.net/projects/greenshot/

  An awesome combination.

--
--gv



Re: [Bug-wget] [Bulk] [PATCH] Add option to write URL rejections to a CSV log.

2015-07-28 Thread Gisle Vanem

Jookia wrote:

I've not tried your patch. But by reading it,


+static void write_url_csv (FILE* f, struct url *url)
+{
+  if (!f)
+return;


Isn't this test superfluous? Already done by caller (?).

I'd suggest the reject-log starts with a comment:

  FILE *rejectedlog = 0;
  if (opt.rejected_log)
{
  rejectedlog = fopen (opt.rejected_log, w);
  if (!rejectedlog)
logprintf (LOG_NOTQUIET, %s: %s\n, opt.rejected_log, strerror 
(errno));
  else
fprintf (rejectedlog,# Wget reject-log %s generated at --%s-- for 
%s\n,
 opt.rejected_log,
 datetime_str (time (NULL),
 base-url, where is this stored? ));

But according to http://tools.ietf.org/html/rfc4180,
it doesn't specify if a # comment is legal before the
CSV header. I think most DB-apps do allow it though.
  http://stackoverflow.com/questions/1961006/can-a-csv-file-have-a-comment

Nice work.

--
--gv



Re: [Bug-wget] cannot get 'wget --recursive' to work

2015-07-27 Thread Gisle Vanem

Dave Ohlsson wrote:


I would like to download this page:

 https://noppa.aalto.fi/noppa/kurssi/ms-a0210/viikkoharjoitukset

as well as its subpages, especially the .pdf documents:


...

When I give this command:

 $ wget --page-requisites --convert-links --recursive --level=0
--no-check-certificate --no-proxy -E -H -Dnoppa.aalto.fi
http://dnoppa.aalto.fi/ -k -d
https://noppa.aalto.fi/noppa/kurssi/ms-a0210/viikkoharjoitukset

I get only:

 $ ls -R
 .:
 noppa.aalto.fi

 ./noppa.aalto.fi:
 noppa  robots.txt

 ./noppa.aalto.fi/noppa:
 kurssi

 ./noppa.aalto.fi/noppa/kurssi:
 ms-a0210

 ./noppa.aalto.fi/noppa/kurssi/ms-a0210:
 viikkoharjoitukset.html

I have tried several wget options, with no luck.

What could be the problem?


Terve, terve.

This works fine here:
  wget -Apdf -r -np 
https://noppa.aalto.fi/noppa/kurssi/ms-a0210/viikkoharjoitukset

Although all the .PDFs end up as:

  
noppa.aalto.fi\noppa\kurssi\ms-a0210\viikkoharjoitukset\MS-A0210_hints_for_w45.pdf
  
noppa.aalto.fi\noppa\kurssi\ms-a0210\viikkoharjoitukset\MS-A0210_hints_for_w46.pdf
  
noppa.aalto.fi\noppa\kurssi\ms-a0210\viikkoharjoitukset\MS-A0210_hints_for_w47.pdf
  ...

If you want them in current directory, you can add
 '--no-directories' to the cmd-line.

BTW, I'm on Win-8. But this should work just the same on Unix.
  Oh, no wait, I see from your log User-Agent: Wget/1.13.4 (cygwin)
  That's not a real Unix :-)

--
--gv



Re: [Bug-wget] cannot get 'wget --recursive' to work

2015-07-27 Thread Gisle Vanem

Giuseppe Scrivano wrote:


Dave Ohlsson dave.ohls...@gmail.com writes:


I have tried several wget options, with no luck.

What could be the problem?


what happens when you also specify -e robots=off in the command?


You're correct. I forgot I had 'robots = off' in my
%WGETRC% file.

--
--gv



Re: [Bug-wget] Help request to download data from http

2015-07-13 Thread Gisle Vanem

Pedro LL wrote:


I am a beginner user of wget. I wanted to use it to download data from an 
specific

 website into my mac using unix but I have been unable after many many 
attempts.

I was typing in the terminal:
  wget -q -O - http://sodaserver.tamu.edu/assim/SODA_2.2.4/ | grep _2009 |

 wget -N --wait=0.5 --random-wait --force-html -i -

but it returns this:-: Cannot resolve incomplete link /icons/unknown.gif.-


Since the base-href was removed in the first output, you'll have
to add it yourself in the 2nd invocation of Wget. Something like:

  wget -q -O - http://sodaserver.tamu.edu/assim/SODA_2.2.4/ | grep _2009 |
   wget -N --base=http://sodaserver.tamu.edu/assim/SODA_2.2.4/ --wait=0.5
  --random-wait --force-html -i -

But I think you could just do:
  wget -rq -nd -np -A.cdf --accept-regex=.*_2009.* 
http://sodaserver.tamu.edu/assim/SODA_2.2.4/

directly.

--
--gv



Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism

2015-05-01 Thread Gisle Vanem

Eli Zaretskii wrote:


I don't see any threads created by run_with_timeout on my system, when
I download the above URL.  In fact, if I set a breakpoint in
run_with_timeout, the only 2 calls to it during the whole download are
from getaddrinfo_with_timeout and from connect_with_timeout, both with
timeout of zero, which calls the function synchronously, both on
Windows and on Posix hosts.

So I guess I don't see what Gisle describes as separate thread for
HTTPS reads.  What am I missing?


It depends on what 'opt.read_timeout' is globally. So
don't use read-timeout = 0 in your wgetrc.


--
--gv



Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism

2015-04-30 Thread Gisle Vanem

Tim Ruehsen wrote:


Some additional thoughts:
- TFO won't work with HTTPS as long as the used SSL library does not support
TFO.


Isn't SSL in Wget already rather slow? Due to the way SSL_Read()
is called in a SIGALRM-handler or separate Win32-thread for
all (?) HTTPS reads.

'run_with_timeout()' seems to waste 1000s of good cycles per
SSL-read (at least on Win32). Couldn't perhaps this be improved
to do use a priori pool of e.g. 10 alarm-handlers or threads?
Just my €0.02.

--
--gv



Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism

2015-04-30 Thread Gisle Vanem

Tim Ruehsen wrote:


BTW, 1000 cycles on a GHz CPU is 1 micro second. How much does it influence
the overall download duration for your use case ? How often is SSL_Read called
in a real life use-case (e.g. downloading 1GB on a 2/10/50/100 mbps
connection).


Hard to tell since I didn't find any large files I could D/L via SSL.
You have one? But some quick tests (only a 48 kByte file):

  wget -q -O test_ssl.html https://www.ssllabs.com/ssltest/viewMyClient.html
  Elapsed: 0:00:02,35

  wget -qT0 -O test_ssl.html https://www.ssllabs.com/ssltest/viewMyClient.html
  Elapsed: 0:00:01,86

  curl -so test_ssl.html https://www.ssllabs.com/ssltest/viewMyClient.html
  Elapsed: 0:00:01,79

'-T0' shouldn't create any threads (like libcurl does). Hence the same
speed (but depends on many factors).

BTW. the timer is in my 4NT shell and both Wget and curl uses exactly the
  same OpenSSL DLLs (all built with the same 32-bit MSVC v18).

Will investigate further.

--
--gv



Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism

2015-04-30 Thread Gisle Vanem

Daniel Stenberg wrote:


On Thu, 30 Apr 2015, Tim Ruehsen wrote:


Originally, Gisle talked about CPU cycles, not elapsed time.
That is quite a difference...


Thousands of cycles per invoke * many invokes = measurable elapsed time


True it seems, but Iv'e not tried SSL times on a local-net.
Some more info with the aid of the URL you provided:

wget -q -O NUL
 
https://download-installer.cdn.mozilla.net/pub/firefox/releases/37.0.2/win32/en-GB/Firefox
 Setup 37.0.2.exe

results in 9931 DLL attach/detaches!

For a 40 MByte file that is approx. 1 new thread per 4 kByte read.
I was thinking that increasing read-buffer would help. But where?
The code is bit of a mess IMHO. Increasing the Rx buffer in
fd_read_body() didn't help. Is this the chief in this regard?

Without getting any numbers, I can see in 'Process Explorer'
that all those run_with_timeout() calls (and no '-T0') amount
to some more user+kernel time. I guess using a profiler is next.
Or maybe someone knows of a Win-program that can report total
CPU (kernel/user) time from the cmd-line?

BTW. My ISP gives me 25 Mbit/s in and 10 MBit/s out.

--
--gv



Re: [Bug-wget] --connect-timeout doesn't work on Windows

2015-03-14 Thread Gisle Vanem

Tim Rühsen wrote:


There has been commit 4a685764a845d5c74a76fcb49a4671f055b8d5f4 (15.5.2011)
between release 1.12 and 1.13. It sets the socket to blocking after calling
gnulib's select().


This has nothing to do with connect() hanging. select() comes into play
later. Besides my patch to remove 'set_windows_fd_as_blocking_socket()'
was voted down back then.

I see the from the call-stack that connect() is hanging inside
Gnulib:
  ...
  ntdll.dll!ZwWaitForSingleObject+0xc
  ntdll.dll!RtlInterlockedPushEntrySList+0x32b
  ntdll.dll!RtlInterlockedPushEntrySList+0x355
  ntdll.dll!RtlResetNtUserPfn+0x3f
  wget.exe!close_fd_maybe_socket+0x36
  wget.exe!execute_close_hooks+0x30
  wget.exe!execute_all_close_hooks+0x17
  wget.exe!rpl_close+0x12
  wget.exe!connect_to_host+0x3ea
  wget.exe!gethttp+0x686
  wget.exe!http_loop+0x430
  wget.exe!retrieve_url+0x1cb
  ..

And from the command 'wget -d --connect-timeout=4 http://192.0.2.1:12345/' + ^C:

  Setting --connect-timeout (connecttimeout) to 4
  DEBUG output created by Wget 1.15.00 (09-March-2015) on Win-8.1. Build 9600 
(MSVC).
  ...
  seconds 0.00, Connecting to 192.0.2.1:12345... seconds 4.00,

So it seems the 'run_with_timeout()' used to connect works fine since
the thread is not running. So it seems the Gnulib's close() is blocking
for an unknown reason.

--
--gv



[Bug-wget] xfree() crashes on libidn memory

2015-02-10 Thread Gisle Vanem

I got a crash in my MSVC-built Wget by accidentally mixing
debug and release obj/libs in my build. (libidn in debug-mode
and Wget in release-mode). So the memory returned from
'idn_decode()' in connect.c:

if (opt.enable_iri  (name = idn_decode ((char *) print)) != NULL)
...
xfree (name);

triggered a crash; the CRT mem-block layouts are different in
release and debug-modes. The important thing is that IDN should
free it's own memory using idn_free(). This function has been in
there since 2004!

Same problem in host.c too. Here is a patch:

--- a/connect.c   2015-02-05 15:31:22 +
+++ b/connect.c2015-02-05 15:33:27 +
@@ -278,7 +278,7 @@
   str = xmalloc (len);
   snprintf (str, len, %s (%s), name, print);
   str[len-1] = '\0';
-  xfree (name);
+  idn_free (name);
 }

   logprintf (LOG_VERBOSE, _(Connecting to %s|%s|:%d... ),


--- a/host.c  2015-02-05 15:31:22 +
+++ b/host.c   2015-02-03 01:57:33 +
@@ -741,7 +741,7 @@
   str = xmalloc (len);
   snprintf (str, len, %s (%s), name, host);
   str[len-1] = '\0';
-  xfree (name);
+  idn_free (name);
 }

   logprintf (LOG_VERBOSE, _(Resolving %s... ),


--

This probably is alien stuff to you Unix folks. You have no
release/debug-modes to take care of (?)

--
--gv




[Bug-wget] Use of 'ssl_st'

2015-02-05 Thread Gisle Vanem

The recent changes in OpenSSL's API has caused libcurl, Wget
and probably other packages to break. Here is one example:

  openssl.c(548) : error C2037: left of 'state' specifies undefined 
struct/union 'ssl_st'

This structure is now tucked away in ssl_locl.h.

What can be done about this? There is probably a function/macro
for it now. Here I just did a:

--- a/openssl.c   2015-02-05 15:31:22 +
+++ b/openssl.c   2015-02-05 16:22:54 +
@@ -545,7 +545,11 @@
 DEBUGP ((SSL handshake timed out.\n));
 goto timeout;
   }
-  if (scwt_ctx.result = 0 || conn-state != SSL_ST_OK)
+  if (scwt_ctx.result = 0
+#if (OPENSSL_VERSION_NUMBER  0x1010L)
+   || conn-state != SSL_ST_OK
+#endif
+   )
 goto error;

   ctx = xnew0 (struct openssl_transport_context);

PS. Would it one day be possible to build Wget using BoringSSL?
  https://boringssl.googlesource.com/
AFAIK, BoringSSL aims at OpenSSL compatibility, But it's not quite
there yet.

--
--gv



Re: [Bug-wget] [patch] uuid generation in warc.c

2015-01-05 Thread Gisle Vanem

Gisle Vanem wrote:


And for Windows? I guess the 'UuidCreate()' or 'UuidCreateSequential()'
functions from Rpcrt4.dll could be used?

I could write a patch for loading Rpcrt4.dll at run-time if
there's some interest.


Done that now. It was simple enough:

--- Git-latest/src/warc.c2014-12-28 18:03:37 +
+++ src/warc.c  2014-12-28 18:11:48 +
@@ -61,6 +61,7 @@
 #ifdef WINDOWS
 /* we need this on Windows to have O_TEMPORARY defined */
 #include fcntl.h
+#include rpc.h
 #endif

 #ifndef O_TEMPORARY
@@ -631,6 +632,18 @@
   sprintf (urn_str, urn:uuid:%s, uuid_str);
   xfree (uuid_str);
 }
+#elif defined(WINDOWS)
+void
+warc_uuid_str (char *urn_str)
+{
+  BYTE *uuid_str;
+  UUID  uuid;
+
+  UuidCreate (uuid);
+  UuidToString (uuid, uuid_str);
+  sprintf (urn_str, urn:uuid:%s, uuid_str);
+  RpcStringFree (uuid_str);
+}
 #else
 /* Fills urn_str with a UUID based on random numbers in the format
required for the WARC-Record-Id header.



But I have no idea to specify '-lrpcrt4' or 'rpcrt4.lib' to LDADD
conditionally for Win32. Like so?

+
+if WIN32
+  LDADD += -lrpcrt4
+endif
+

--
--gv



Re: [Bug-wget] FTP tests fail on MS-Windows

2014-12-22 Thread Gisle Vanem

Tim Rühsen wrote:


P.S. Now I'm beginning to wonder whether anyone runs the test suite on
Windows, or cares about the results...


Looks like nobody does :-(


I've tried, but failed since my Perl (Strawberry Perl 5.18.4) doesn't
support the fork() call. I don't think ActiveState Perl does either.

--
--gv



Re: [Bug-wget] [patch] uuid generation in warc.c

2014-12-16 Thread Gisle Vanem

Tim Ruehsen wrote:


It looks like a good opportunity to fix ./configure's libuuid detection.

We just have to agree on an approach.
Suggestion:
if --with-libuuid explicitly specified
   search for libuuid (pkg-config or fallback to AC_SEARCH_LIBS)


And for Windows? I guess the 'UuidCreate()' or 'UuidCreateSequential()'
functions from Rpcrt4.dll could be used?

I could write a patch for loading Rpcrt4.dll at run-time if
there's some interest.

--
--gv



Re: [Bug-wget] [PATCH] Fixing C89 warnings

2014-12-03 Thread Gisle Vanem

That's really good news to me. But there are still lots
more C99 errors. Espesially in main.c.


Now another C99 error got into openssl.c. Patch:

--- Git-latest/src/openssl.c2014-12-03 14:06:19 +
+++ openssl.c   2014-12-03 14:14:37 +
@@ -170,6 +170,7 @@
 ssl_init (void)
 {
   SSL_METHOD const *meth;
+  long ssl_options = 0;

 #if OPENSSL_VERSION_NUMBER = 0x00907000
   if (ssl_true_initialized == 0)
@@ -203,8 +204,6 @@
   SSLeay_add_all_algorithms ();
   SSLeay_add_ssl_algorithms ();

-  long ssl_options = 0;
-
   switch (opt.secure_protocol)
 {
 #ifndef OPENSSL_NO_SSL2

---

FYI. There are gcc options to trigger an error for these
  cases. Such as gcc -Wdeclaration-after-statement -Werror.
  But there are other harmless warnings.

--
--gv



[Bug-wget] [Patch] xfree() in mswindows.c

2014-12-03 Thread Gisle Vanem

The change of the 'xfree()' macro should be reflected
in mswindows.c:

--- Git-latest/src/mswindows.c  2014-12-01 17:42:20 +
+++ mswindows.c 2014-12-03 14:43:17 +
@@ -89,7 +89,7 @@
 static void
 ws_cleanup (void)
 {
-  xfree ((char*)exec_name);
+  xfree (exec_name);
   WSACleanup ();
 }

---

Otherwise gcc says:
  mswindows.c: In function 'ws_cleanup':
  mswindows.c:92:3: error: lvalue required as left operand of assignment

Got unnoticed since I mostly use MSVC now.

--
--gv



[Bug-wget] 'xfree' undefined

2014-11-27 Thread Gisle Vanem

The recent change [1] to gettext.h of replacing free() with
xfree() has generated many warnings on MSVC:

  cl -nologo -MD ... -c cookies.c
  g:\mingw32\src\inet\wget\src\gettext.h(218) : warning C4013: 'xfree' 
undefined;
  assuming extern returning int

The cause is in gettext.h (the package from Hell IMHO.
gettext.h is included in wget.h before utils.h which
defines xfree):
 #define _LIBGETTEXT_HAVE_VARIABLE_SIZE_ARRAYS \
  (((__GNUC__ = 3 || __GNUG__ = 2)  !__STRICT_ANSI__) \
   /* || __STDC_VERSION__ = 199901L */ )

I.e. GNU only. So what to do? Here I just did:
  @@ -176,6 +176,7 @@

   #if !_LIBGETTEXT_HAVE_VARIABLE_SIZE_ARRAYS
   #include stdlib.h
  +#define xfree free
   #endif

Wasn't there an alloca() discussion recently? Why not use
that instead? far more portable IMHO.

[1]: 2014-11-27  Darshit Shah  dar...@gmail.com

* cookies.c, gettext.h, init.c, retr.c, url.c, warc.c: Replace usage of
free() with xfree() macro.

--
--gv



Re: [Bug-wget] [PATCH] Fixing C89 warnings

2014-11-20 Thread Gisle Vanem

Tim Ruehsen wrote:


You don't have the latest git version, at least your diff isn't based on it.


Okay. I was too quick. It builds fine with MSVC v16 except
for http.c and the rubbish in progress.c. See [1] below.
Diffs for http.c:

--- ../Git-latest/src/http.c2014-11-20 15:39:55 +
+++ ./http.c2014-11-20 15:54:05 +
@@ -1191,8 +1191,9 @@
 parse_content_disposition (const char *hdr, char **filename)
 {
   param_token name, value;
-  *filename = NULL;
   bool is_url_encoded = false;
+
+  *filename = NULL;
   for ( ; extract_param (hdr, name, value, ';', is_url_encoded);
 is_url_encoded = false)



I guess MSVC 16 can be used on WinXP while MSVC 18 is for Win7 and up ?


Correct. I do have another PC with Win 8.1 and VC Express 2013
(MSVC v18), but feel more at ease with Win-XP ATM.


Please redo your diff and check if the errors below still exist (looks like
size_t is not available !?).


No problem with 'size_t' on MSVC v16.

[1]:
  #else
  # define count_cols(mbs) ((int)(strlen(mbs)))
  # define cols_to_bytes(mbs, cols, *ncols) do {  \
  *ncols = cols;  \
  bytes = cols;   \
  }while (0)
  #endif

(I forgot to add 'HAVE_WCWIDTH' and 'HAVE_MBTOWC').

And BTW. Since large-files are default on Windows (wgint is
  __int64), this patch should be applied:

--- ../Git-latest/src/build_info.c.in   2014-10-30 13:33:41 +
+++ ./build_info.c.in   2014-11-05 13:28:49 +
@@ -2,11 +2,12 @@
 https   defined HAVE_SSL
 ipv6defined ENABLE_IPV6
 iri defined ENABLE_IRI
-large-file  SIZEOF_OFF_T = 8
+large-file  SIZEOF_OFF_T = 8 || defined WINDOWS


--gv



Re: [Bug-wget] [PATCH] Fixing C89 warnings

2014-11-20 Thread Gisle Vanem

Tim Ruehsen wrote:


[1]:
#else
# define count_cols(mbs) ((int)(strlen(mbs)))
# define cols_to_bytes(mbs, cols, *ncols) do {  \
*ncols = cols;  \
bytes = cols;   \
}while (0)
#endif

(I forgot to add 'HAVE_WCWIDTH' and 'HAVE_MBTOWC').


? could you send a patch ? I am not sure what to fix here.


FYI. the error from MSVC was:
  progress.c(844) : error C2010: '*' : unexpected in macro formal parameter list
  progress.c(978) : error C2059: syntax error : 'do'

Here is a patch:

--- ../Git-latest/src/progress.c2014-11-20 15:39:55 +
+++ progress.c  2014-11-20 16:44:03 +
@@ -841,10 +841,7 @@
 }
 #else
 # define count_cols(mbs) ((int)(strlen(mbs)))
-# define cols_to_bytes(mbs, cols, *ncols) do {  \
-*ncols = cols;  \
-bytes = cols;   \
-}while (0)
+# define cols_to_bytes(mbs, cols, ncols) *ncols = cols
 #endif


Does this hold true for Win32 (WinXP 32bit) ?
Or do we have to amend this check ?


Windows since way back has supported 4 GB files. It's
been compilers that were slow following that. Since
MingW/MSVC have libc support for huge-files, that
'wgint' is hardcoded to 64 bits signed. I vaguely remember
me an Hrvoje discussed this long before you switched to
Gnulib.

--
--gv



Re: [Bug-wget] [PATCH] Fixing C89 warnings

2014-11-19 Thread Gisle Vanem

Tim Rühsen wrote:


This patch fixes most C89 warnings for me (-std=c89 -pedantic) since these may
prevent from compiling with MSVC.


That's really good news to me. But there are still lots
more C99 errors. Espesially in main.c.

--gv



Re: [Bug-wget] [Win32 Patch] console-close events

2014-10-21 Thread Gisle Vanem

Ángel González keis...@gmail.com wrote:


I would expect someone running a console application to have a console open. I 
understand the rationale when it's a beginner
learning to program and is creating a console application, but don't really see 
a usecase for wget. How are you running wget that
you get an autoclosing console?


Easily. When e.g. Windows needs to restart (because of a WinUpdate etc.). It
sends a WM_QUERYENDSESSION to all (?) top-level windows. The
Console handler translates that to a CTRL_SHUTDOWN_EVENT for the
program in that console. Details here:
 
http://blogs.msdn.com/b/ntdebugging/archive/2007/06/09/how-windows-shuts-down.aspx

Darit, about the git format-patch. I don't know how.

The src/Changelog entry could simply be:

2014-10-21  Gisle Vanem gva...@yahoo.no

* mswindows.c (ws_handler): Added handling of
  CTRL_CLOSE_EVENT, CTRL_LOGOFF_EVENT and CTRL_SHUTDOWN_EVENT
  to cleanup before Wget exits. Added function ws_event_name() to retrieve
  the event name.

The diff for mswindows.c is as in my original message.

--gv




[Bug-wget] [Patch] progress.c

2014-10-10 Thread Gisle Vanem
The recent change to progress.c doesn't compile with 
'USE_NLS_PROGRESS_BAR = 0'. The error from gcc was:


 progress.c:843:35: error: * may not appear in macro parameter list
 progress.c: In function 'create_image':
 progress.c:975:7: warning: implicit declaration of function 'cols_to_bytes' 
[-Wimplicit-function-declaration]

This patch works here:

--- Git-latest/src/progress.c   2014-10-09 21:14:26 +
+++ src/progress.c  2014-10-10 12:11:33 +
@@ -840,10 +840,8 @@
}
#else
# define count_cols(mbs) ((int)(strlen(mbs)))
-# define cols_to_bytes(mbs, cols, *ncols) do {  \
-*ncols = cols;  \
-bytes = cols;   \
-}while (0)
+# define cols_to_bytes(mbs, cols, ncols) ( \
+   *ncols = cols), cols
#endif

---

And since the ret-val from cols_to_bytes() is used, the 'do { } while(0)'
had to go.

--gv



Re: [Bug-wget] FW: Wget export URL list

2014-09-04 Thread Gisle Vanem

The PowerTool thepowertool...@hotmail.com wrote:


The simple script can be found at 
http://www.comp.eonworks.com/scripts/isolate_url_link-20020716.html

This assumes a real OS (not win*) and have a small working knowledge of wget, 
CL (bash), and can dl and extract the script.


Then I guess:
 lynx -dump -listonly www.xxx.com  URL-results.txt

would be easier. Lynx is available for Windows too.
URL-results.txt would only contain:

References

  1. http://www.bt.no/
  2. http://www.bt.no/rss/
  3. http://www.bt.no/nyheter/

So a bit further filtering could be needed.

--gv



[Bug-wget] Remove set_windows_fd_as_blocking_socket()?

2014-08-05 Thread Gisle Vanem
A little study of why calling the function 'set_windows_fd_as_blocking_socket()' 
is needed, reveals that on downloading a big file (400MByte using ftp), the 
Winsock functions 'WSASetLastError()' and 'WSAGetLastError()' are called 
approx. 277.000 times!!  IMHO this is a terrible waste of CPU-time.


So I read a bit from the Gnulib bug-reports from 2011 on this:
 https://lists.gnu.org/archive/html/bug-gnulib/2011-04/msg00201.html

(the URL in mswindows.c seems a dead link). 


My test with disabling this function all together showed no ill effect.

I failed to find a Gnulib reference/changelog where this bug was fixed 
(after 2011), but reading some sources failed to find any place where 
a network-socket is set to blocking in Gnulib's select() or poll(). 


Hence I request that the call to 'set_windows_fd_as_blocking_socket()'
should be removed.

BTW. the actual call to Winsock's ioctlsocket() is performed by some 
hairy FD-hook stuff inside Gnulib if anybody wants to verify it.


--gv




Re: [Bug-wget] Dangling info-lfilename

2014-07-19 Thread Gisle Vanem

Darshit Shah dar...@gmail.com wrote:


Could you please confirm if the above patch fixes the problem?


Guiseppe have reverted that relevant change. I.e. it's now a 
buffer like it used to:


@@ -123,7 +123,7 @@ struct fake_fork_info
{
HANDLE event;
bool logfile_changed;
- char lfilename[MAX_PATH + 1];
+ char *lfilename;
};

2014-06-19  Giuseppe Scrivano  gscri...@redhat.com

* mswindows.c (fake_fork_child): Revert dinamic allocation of
info-lfilename.
Reported by: Gisle Vanem gva...@yahoo.no

--gv



Re: [Bug-wget] Dangling info-lfilename

2014-07-19 Thread Gisle Vanem

Could you please confirm if the above patch fixes the problem?


Guiseppe have reverted that relevant change. I.e. it's now a 
buffer like it used to:


So yes, that patch fixes the problem of the dangling pointer.

--gv




[Bug-wget] Question on WARC

2014-07-18 Thread Gisle Vanem

Hello list.

I have been toying around with the '--warc-*' options in Wget.
And it seems to work like a charm with my MingW/Win-XP version. 

But the question I'm left with is; it's nothing that tells me whether 
Wget loads content from a local WARC-cache *or* reads the content 
from the network. If the response is the same (based on 'Last-Modified' etc.

I guess), it would be nice to know where the data came from.

Or have I misunderstood the purpose of WARC in Wget? The Wget 
docs on it seems rather limited.


--gv



Re: [Bug-wget] Question on WARC

2014-07-18 Thread Gisle Vanem

Gisle Vanem gva...@yahoo.no wrote:

Or have I misunderstood the purpose of WARC in Wget? The Wget 
docs on it seems rather limited.


Seems I have; the warc-file gets overwritten each time Wget runs.
I assumed the file(s) should be rotated. Obviously not.

--gv



Re: [Bug-wget] [Win32 Patch] console-close events

2014-06-16 Thread Gisle Vanem

Ángel González keis...@gmail.com wrote:


+  event == CTRL_SHUTDOWN_EVENT ? CTRL_SHUTDOWN_EVENT :
+  ?);

Better something like UNKNOWN EVENT ?


Ok. That seems better.


PS. The above Sleep() seems to be ignored by WinCon. At least I failed to
 make it sleep more than ~500 msec.

There may be a timeout on how long you can stay processing the event.

Why do you need that Sleep() call at all? I would remove  it.


When logging to the console (no '-o log-file' option), the Sleep(500) will make
the final ... cleanup. message stay a tiny bit longer (but barely readable).
Without a Sleep(), the console gets closed with only a message-beep.


Thanks for sharing your patch!


Thanks for the interest. Do you build Wget on Win32? Anybody else
except me build and care about Wget/Win32 here?

Revised patch:

--- mswindows.c.orig  2014-06-12 13:02:44 +
+++ mswindows.c 2014-06-17 00:58:08 +
@@ -42,6 +42,7 @@

#include utils.h
#include url.h
+#include init.h

#ifndef ES_SYSTEM_REQUIRED
#define ES_SYSTEM_REQUIRED  0x0001
@@ -337,6 +338,17 @@
  /* If we get here, we're the child.  */
}

+/* Return the name for the console-events we might receive. */
+static const char *ws_event_name (DWORD event)
+{
+  return (event == CTRL_C_EVENT ? CTRL_C_EVENT :
+  event == CTRL_BREAK_EVENT ? CTRL_BREAK_EVENT :
+  event == CTRL_CLOSE_EVENT ? CTRL_CLOSE_EVENT :
+  event == CTRL_LOGOFF_EVENT ? CTRL_LOGOFF_EVENT :
+  event == CTRL_SHUTDOWN_EVENT ? CTRL_SHUTDOWN_EVENT :
+  UNKNOWN EVENT);
+}
+
static BOOL WINAPI
ws_handler (DWORD dwEvent)
{
@@ -352,6 +364,16 @@
  ws_hangup (CTRL+Break);
  return TRUE;
#endif
+case CTRL_CLOSE_EVENT:
+case CTRL_LOGOFF_EVENT:
+case CTRL_SHUTDOWN_EVENT:
+  MessageBeep (MB_OK);
+  logprintf (LOG_NOTQUIET, _(\nGot %s. Performing cleanup.\n),
+ ws_event_name(dwEvent));
+  cleanup();
+  Sleep(500);
+  return TRUE;
+
default:
  return FALSE;
}

--

--gv




Re: [Bug-wget] [PATCH 02/14] Do not use exit() with a magic number

2014-06-12 Thread Gisle Vanem

gscriv...@gnu.org wrote:

...


diff --git a/src/mswindows.c b/src/mswindows.c
index 179773e..c0d9be6 100644
--- a/src/mswindows.c
+++ b/src/mswindows.c
@@ -308,7 +308,7 @@ cleanup:

  /* We're the parent.  If all is well, terminate.  */
  if (rv)
-exit (0);
+exit (WGET_EXIT_SUCCESS);


This breaks compilation of mswindows.c as I wrote to Guiseppe
privately; add exits.h at top.

--gv



Re: [Bug-wget] [Bug-Wget][Patch] Implement --show-progress

2014-05-01 Thread Gisle Vanem

Darshit Shah dar...@gmail.com wrote:


In a gist, this patch:
1. Cleans up output in case of dot-style progress bar by not printing extra
newlines when not needed
2. Ensures all the columns are perfectly aligned in all (?) cases


Not quite; as you said the indenting on 1 lines is a feature. Okay, but the
 --.-K/s is indented 2 spaces relative to the 1st line and the 
in 0.004s is indented 1 space. As in this example:


wget.exe -rApng --show-progress -q ftp://ftp.watt-32.net/watt-doc/

ftp.watt-32.net/watt-doc/.listing [=  ] 56,013 
77.2KB/s   in 0.7s
tp.watt-32.net/watt-doc/doxygen.p 100%[] 1,576   
--.-K/s   in 0.004s
tp.watt-32.net/watt-doc/ftv2blank 100%[] 174 
--.-K/s   in 0s
tp.watt-32.net/watt-doc/ftv2doc.p 100%[] 255 
--.-K/s   in 0s
tp.watt-32.net/watt-doc/ftv2folde 100%[] 259 
--.-K/s   in 0s
tp.watt-32.net/watt-doc/ftv2folde 100%[] 261 
--.-K/s   in 0s
tp.watt-32.net/watt-doc/ftv2lastn 100%[] 233 
--.-K/s   in 0s
tp.watt-32.net/watt-doc/ftv2link. 100%[] 358 
--.-K/s   in 0s
tp.watt-32.net/watt-doc/ftv2mlast 100%[] 160 
--.-K/s   in 0s
tp.watt-32.net/watt-doc/ftv2mnode 100%[] 194 
--.-K/s   in 0s
tp.watt-32.net/watt-doc/ftv2node. 100%[] 235 
--.-K/s   in 0s
tp.watt-32.net/watt-doc/ftv2plast 100%[] 165 
--.-K/s   in 0s
tp.watt-32.net/watt-doc/ftv2pnode 100%[] 200 
--.-K/s   in 0s
tp.watt-32.net/watt-doc/ftv2vertl 100%[] 229 
--.-K/s   in 0.001s

This maybe have something to with the .listing file since with HTTP all looks 
okay.

(use proportional fonts to see the above misalignment).

--gv



Re: [Bug-wget] mirroring a Blogger blog without the comments

2014-04-25 Thread Gisle Vanem

j045...@gmail.com wrote:


Even more general would be something like --next-urls-cmd=CMD, where you
could supply a command that accepts an HTTP response on stdin, and then
writes the set of URLs to stdout which should be crawled based on it. 


You could use Lynx to extract all the links with: 
 lynx -dump --listonly URL  urls.file


Edit / grep the 'urls.file' and use the 'wget -i' option to download what
you want. From 'man wget':

-i file
--input-file=file
Read URLs from a local or external file.  If - is specified as
file, URLs are read from the standard input.  (Use ./- to read from
a file literally named -.)

If this function is used, no URLs need be present on the command
line.  If there are URLs both on the command line and in an input
file, those on the command lines will be the first ones to be
retrieved.  If --force-html is not specified, then file should con-
sist of a series of URLs, one per line.

However, if you specify --force-html, the document will be regarded
as html.  In that case you may have problems with relative links,
which you can solve either by adding base href=url to the
documents or by specifying --base=url on the command line.

If the file is an external one, the document will be automatically
treated as html if the Content-Type matches text/html.  Further-
more, the file's location will be implicitly used as base href if
none was specified.

--gv



Re: [Bug-wget] --progress should not be overridden by --quiet

2014-04-17 Thread Gisle Vanem

Darshit Shah dar...@gmail.com wrote:


progress bar to me. It does not change the existing output so we have
full backward compatibility with existing scripts, but allows the user
to explicitly display the progress bar if required.


I don't see your patch is considering the Windows console-title
progress indicator. In e.g. retr.c:

#ifdef WINDOWS
 if (toread  0  !opt.quiet)
   ws_percenttitle (100.0 *
(startpos + sum_read) / (startpos + toread));
#endif

Maybe this should test for this new option too? Something like:
 if (toread  0  !opt.quiet || opt.show_progress)
...

But mswindows.c always needs a 'curr_url' to show percentage.
I don't think 'ws_changetitle()' will be called as needed for this to
happen. The best IMHO would be to drop 'ws_changetitle()' and 
always have the current-URL as a parameter to 'ws_percenttitle()'.


--gv



Re: [Bug-wget] statically link wget on mingw32 with win-builds

2014-04-06 Thread Gisle Vanem

Ángel González keis...@gmail.com wrote:


I have built without problems wget for windows leaving out ssl support.
If you also want https, you simply need a static version of openssl/gnutls as
Giusseppe says. Which mostly depends on the headaches those libraries give
you :)


And I have made a MingW-makefile (attached) with the option to use
only static libs for even OpenSSL. Running:

cygcheck wget.exe
./\wget.exe
 f:\windows\system32\advapi32.dll
   f:\windows\system32\KERNEL32.dll
 f:\windows\system32\ntdll.dll
   f:\windows\system32\RPCRT4.dll
 f:\windows\system32\Secur32.dll
 f:\windows\system32\gdi32.dll
   f:\windows\system32\USER32.dll
 f:\windows\system32\msvcrt.dll
 f:\windows\system32\ws2_32.dll
   f:\windows\system32\WS2HELP.dll

shows only the normal system DLLs. And 'wget -V' has:

+digest +https +ipv6 +iri +large-file -nls +ntlm +opie +ssl/openssl
+zlib

Attached.

--gv


Makefile.MingW
Description: Binary data


Re: [Bug-wget] [PATCH] add support for SO_MARK: --so-mark

2014-01-29 Thread Gisle Vanem

Alban Crequy alban.cre...@collabora.co.uk wrote:


I'm not sure this is a feature that wget would want, but it was uselful to me
for testing traffic control.


I'm sure. But you should at least #ifdef SO_MARK around that
code. Winsock doesn't have it.

--gv



Re: [Bug-wget] GNU wget 1.15 released

2014-01-23 Thread Gisle Vanem

Ray Satiro raysat...@yahoo.com wrote:


wget -d  https://encrypted.google.com

..

Yeah I'm getting a flood of that when I -d https
seconds 900.00, Winsock error: 0
seconds 900.00, Winsock error: 0
seconds 900.00, Winsock error: 0
seconds 900.00, Winsock error: 0
seconds 900.00, Winsock error: 0
seconds 900.00, Winsock error: 0
seconds 900.00, Winsock error: 0
over and over


I've figured out this bug or annoyance. In case of google.com, it has lots of
IPs (11 now). Wget seems to connect or do something for each of these IPs 
using run_with_timeout(). 

In any case, there are no errors in those spawned threads. I originally wrote 
'run_with_timeout()' for Windows back in 2003 and used 'struct thread_data' 
to pass back the Winsock error-codes which are per thread. When 
'thread_arg.ws_error' is 0, the debug-message should not be printed. Hence

this diff should fix it:

--- orig/src/mswindows.c 2013-11-18 16:31:41 +
+++ src/mswindows.c2014-01-23 13:48:27 +
@@ -559,7 +559,10 @@
  /* Propagate error state (which is per-thread) to this thread,
 so the caller can inspect it.  */
  WSASetLastError (thread_arg.ws_error);
+  if (thread_arg.ws_error)
  DEBUGP ((Winsock error: %d\n, WSAGetLastError ()));
+  else
+ DEBUGP ((\n));
  rc = false;
}
  else

--

--gv



Re: [Bug-wget] GNU wget 1.15 released

2014-01-22 Thread Gisle Vanem

Giuseppe Scrivano gscriv...@gnu.org wrote:


I am pleased to announce the new version of GNU wget.


..

Please report any problem you may experience to the bug-wget@gnu.org
mailing list.


Okay. I'm having an issue with ioctl() in mswindows.c. It's really rpl_ioctl()
from gnulib:

.text  0x004607a8   0x3c 
g:/MingW32/src/gnu/gnulib/lib/libgnulib.a(ioctl.o)
   0x004607a8rpl_ioctl

Almost on any url, wget aborts with this message:

wget -d http://bla-bla

DEBUG output created by Wget 1.15 (MingW) on Windows_NT.
...
---request end---
ioctl() failed.  The socket could not be set as blocking.
Winsock error: 0

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

-

There should be no error; see Winsock error: 0. I've just commented out the
if-statement with the abort() and things just work fine AFAICS.

This is is on Win-XP SP3, wget built with MingW+gcc 4.7.2.

--gv



Re: [Bug-wget] GNU wget 1.15 released

2014-01-22 Thread Gisle Vanem

Ray Satiro raysat...@yahoo.com wrote:


I'm pretty sure I wrote that code. I would not edit it out, if you are seeing 
it there is likely a problem with your build.


Probably, or with gnulib on MingW.


You say *almost* any URL? Can you give an example? What happens if you try to 
get google's page:
wget http://www.google.com
wget --ca-certificate=c:\wget\root\bin\cacert.pem https://encrypted.google.com


I've built Wget (for many years) with all protocols enabled. And I already have a 
'ca-certificate' in my wgetrc-file. https works fine:


 wget -d  https://encrypted.google.com

DEBUG output created by Wget 1.15.00 (MingW) on Windows_NT.
...
Initiating SSL handshake.
seconds 900.00, Winsock error: 0!! this annoying stuff seems to be caused 
by gnulib too.
Handshake successful; connected socket 3 to SSL handle 0x00e336e8
certificate:
 subject: /C=US/ST=California/L=Mountain View/O=Google Inc/CN=*.google.com
 issuer:  /C=US/O=Google Inc/CN=Google Internet Authority G2
X509 certificate successfully verified and matches host encrypted.google.com

-

It's only the problem with ioctl(). I think I figured out why. Somehow
I built gnulib w/o 'WINDOWS_SOCKETS=1'. So that seems to be sorted now.

FYI. The errno set by the failing ioctl() was off-course ENOSYS since it
didn't handle sockets.

Well, there probably should be an error. The error you are seeing has 
come up before because when gnulib is built there could be some 
erroneous mixing of select and ioctl; one is native and the other is a 
gnu wrapper or something like that. Really they both should be gnulib 
wrappers, I think.


I know. Running depends wget.exe shows that libgnulib.dll has wrappers
for select + ioctl etc. Wget only imports these directly from ws2_32.dll:

 freeaddrinfo
 getaddrinfo
 htons
 ntohs
 WSAAddressToStringA
 WSACleanup
 WSAGetLastError
 WSASetLastError
 WSAStartup

So the rest of the socket stuff is handled in gnulib.dll. But I don't see
how close-a-socket is imported from gnulib. There is no rpl_close
imported from the gnulib DLL. Only rpl_fclose?!

--gv



[Bug-wget] [Patch] src/build_info.c.in for Windows

2013-11-18 Thread Gisle Vanem

commit 21d33c85fb72ef6518bdbf331cabc3056eda8561
Author: Gisle Vanem gva...@yahoo.no
Date:   Mon Nov 18 14:50:01 2013 +

   On Windows we always support large files always since 'wgint' is
   set to 64-bit in mswindows.h.

diff --git a/src/build_info.c.in b/src/build_info.c.in
index c0b1677..7298149 100644
--- a/src/build_info.c.in
+++ b/src/build_info.c.in
@@ -2,7 +2,7 @@ digest  defined ENABLE_DIGEST
https   defined HAVE_SSL
ipv6defined ENABLE_IPV6
iri defined ENABLE_IRI
-large-file  SIZEOF_OFF_T = 8
+large-file  (SIZEOF_OFF_T = 8) || defined WINDOWS

nls defined ENABLE_NLS
ntlmdefined ENABLE_NTLM



--gv



[Bug-wget] [Patch] src/build_info.c.in for zlib

2013-11-18 Thread Gisle Vanem

commit b950d30bc50d5e52ac1a0ab7050e5ce0b6a35e82
Author: Gisle Vanem gva...@yahoo.no
Date:   Mon Nov 18 15:01:37 2013 +

   Add {+|-}zlib to 'compiled_features[]'.

diff --git a/src/build_info.c.in b/src/build_info.c.in
index 7298149..471ba40 100644
--- a/src/build_info.c.in
+++ b/src/build_info.c.in
@@ -7,6 +7,7 @@ large-file  (SIZEOF_OFF_T = 8) || defined WINDOWS
nls defined ENABLE_NLS
ntlmdefined ENABLE_NTLM
opiedefined ENABLE_OPIE
+zlibdefined HAVE_LIBZ

ssl choice:
openssl defined HAVE_LIBSSL || defined HAVE_LIBSSL32

--

--gv



Re: [Bug-wget] Broken 'DEBUG_MALLOC' code

2013-11-11 Thread Gisle Vanem

Tim Ruehsen tim.rueh...@gmx.de wrote:


init.c:1738:12: error: 'url' undeclared (first use in this function)
xfree (url[i]);

Thatnks for pointing this out.
Are you going to create a patch ?


Attached are my diffs.

--gv
diff --git a/src/init.c b/src/init.c
index 84ae654..9ff374b 100644
--- a/src/init.c
+++ b/src/init.c
@@ -1700,8 +1700,11 @@ void spider_cleanup (void);

/* Free the memory allocated by global variables.  */
void
-cleanup (void)
+cleanup (int num_urls, char ***url_list)
{
+  int i;
+  char **url = *url_list;
+
  /* Free external resources, close files, etc. */

  /* Close WARC file. */
@@ -1734,8 +1737,8 @@ cleanup (void)
  host_cleanup ();
  log_cleanup ();

-  for (i = 0; i  nurl; i++)
-xfree (url[i]);
+  for (i = 0; i  num_urls; i++)
+ xfree (url[i]);

  {
extern acc_t *netrc_list;

diff --git a/src/init.h b/src/init.h
index 21ebee5..ab75205 100644
--- a/src/init.h
+++ b/src/init.h
@@ -39,7 +39,7 @@ void initialize (void);
void run_command (const char *);
void setoptval (const char *, const char *, const char *);
char *home_dir (void);
-void cleanup (void);
+void cleanup (int num_urls, char ***url_list);
void defaults (void);
bool run_wgetrc (const char *file);

diff --git a/src/main.c b/src/main.c
index 19d7253..7168239 100644
--- a/src/main.c
+++ b/src/main.c
@@ -1701,7 +1701,7 @@ outputting to a regular file.\n));
  if (opt.convert_links  !opt.delete_after)
convert_all_links ();

-  cleanup ();
+  cleanup (nurl, url);

  exit (get_exit_status ());
}

Re: [Bug-wget] Broken 'DEBUG_MALLOC' code

2013-11-08 Thread Gisle Vanem

Tim Ruehsen tim.rueh...@gmx.de wrote:
  ^

init.c:1738:12: error: 'url' undeclared (first use in this function)
xfree (url[i]);

Thatnks for pointing this out.
Are you going to create a patch ?


Okay. Later next week when I the time.

--gv



[Bug-wget] Broken 'DEBUG_MALLOC' code

2013-11-07 Thread Gisle Vanem

Hi list.

Nice to know that Wget is still being developed. I used to contribute quite
a bit when Hrvoje Niksic was the chief. But IMHO it's a pity gnulib is used now.
This excludes a lot of Wget targets. Like djgpp, MSVC and Watcom I once
patched for. It was easy to build gnulib using MingW. But Iæm not sure other
non-gcc compilers would be that easy.

Anyway, to the point. I tried my old MingW makefile on the latest git-repo. It
stumbled on some code inside 'DEBUG_MALLOC'. Here in init.c (line 1737):

   for (i = 0; i  nurl; i++)
 xfree (url[i]);

Should 'nurl' and 'url' in main.c be made public?

There are some other Windows issues which I can come back to.

PS.
 I'm resending this once again. I tried yesterday. Still no receipt or copy of
 my own message from MailMan. I really am subscribed. Seems MailMan is
 overworked or broken.

--gv




Re: [Bug-wget] [Bulk] Broken 'DEBUG_MALLOC' code

2013-11-07 Thread Gisle Vanem

Gisle Vanem gva...@yahoo.no wrote:


PS.
 I'm resending this once again. I tried yesterday. Still no receipt or copy of
 my own message from MailMan. I really am subscribed. Seems MailMan is
 overworked or broken.


It seems so:
 X-Mailman-Approved-At: Thu, 07 Nov 2013 16:36:26 -0500

It got delivered here to my Yahoo 9 hours later. Way to go Mailman!

--gv





[Bug-wget] [Patch] main.c

2009-02-28 Thread Gisle Vanem

The Wget-patches list seems to have fallen off the net, so I send
you this simple patch directly.


2009-02-27  Gisle Vanem  gva...@broadpark.no

 * main.c: freopen (NULL,.. causes an assertion in MSVC debug-mode. 
   I.e. NULL isn't legal. But the CONOUT$ device works fine.



--- hg-latest/src/main.c Thu Feb 26 16:58:03 2009
+++ ./main.cFri Feb 27 07:46:13 2009
@@ -1124,7 +1124,7 @@
{
#ifdef WINDOWS
  FILE *result;
-  result = freopen (NULL, wb, stdout);
+  result = freopen (CONOUT$, wb, stdout);
  if (result == NULL)
{
  logputs (LOG_NOTQUIET, _(\

--gv