from:"Ander Juaristi"

Re: Download hangs if stderr is redirected to stdout

2021-10-12 Thread Ander Juaristi


Hi,

Though Bash bites to me from time to time (and hence I might be wrong), 
the last statement doesn't look like valid syntax.


What you're doing is redirecting stdout to a file called '1'. Then the 
argument '2' is taken as another URL to download. So wget gets that file 
from GitHub successfully, and then tries to get http://2, which hangs.


The right syntax to redirect stderr to stdout seems to be the first one.

El 2021-10-09 01:09, Joseph Marchand escribió:

#this works
wget 
'https://github.com/mikehooper/Minecraft/raw/main/lwjgl3arm32.tar.gz' 
2>&1


#this works
wget 
'https://github.com/mikehooper/Minecraft/raw/main/lwjgl3arm64.tar.gz'


#this works
wget 
'https://github.com/mikehooper/Minecraft/raw/main/lwjgl3arm64.tar.gz' 
1>&2


#this works
wget
'https://github.com/mikehooper/Minecraft/raw/main/lwjgl3arm64.tar.gz'
&>/dev/null

#this hangs forever
wget 
'https://github.com/mikehooper/Minecraft/raw/main/lwjgl3arm64.tar.gz' 
2&>1


This problem only occurs with THIS exact url and THIS exact 
output-redirection.


Any ideas?

Re: TLS-PSK and TLS-SRP support?

2020-09-16 Thread Ander Juaristi


El 2020-09-15 19:05, Witold Baryluk escribió:

Hi,

I love wget, but I can't find if it supports PSK or SRP protocols?
Underlying openssl supports them, and it would be nice to use it with
wget when, especially when using TLS v1.2 and TLS v1.3.

I am mostly interested in PSK, but SRP support would be also very
useful. I do have a HTTP server that do use TLS v1.3 (and 1.2), and
uses PSK for mutual authentication and encryption. I verified it is
working using various tools and code, but it can't be easily used
using generic tools like wget.  There is work in curl to add support
for PSK too.


Indeed, none of these are supported.

SRP is a legacy protocol nowadays. It pursued interesting goals, such as 
being hard to brute-force the password.


However unfortunately nobody cared enough to maintain it and it's not up 
to modern-day standards.


AFAIK SHA-1 is the only supported hash function which is insecure, there 
is no elliptic-curve equivalent, and hasn't been adapted to TLS 1.3.


PSK is a different issue. I personally wouldn't oppose to supporting it. 
I'd rather not let the user to manage PSKs directly, but this is a 
matter of taste. What's more, PSKs are the basis of 0-RTT in TLS 1.3 and 
IMO that's a good thing wget2 should do implicitly.




If it is supported, (maybe using --ciphers option), it seems not
documented. At least I don't see ways to provide psk or parameters to
srp parts. (as far as I know, this can't be provided via
--private-key). Option to provide a password on the command line, or
via an ASCII, binary or hex file would be the best. (to not leak
password via /proc/*/cmdline).


Thank you!

Re: Wget2 bootstrap and "Unknown option: --"

2020-08-25 Thread Ander Juaristi


El 2020-08-10 19:01, Darshit Shah escribió:

Interesting.. Thanks for the report.

However, it may make sense to keep this behavior since I don't think
that the script is compatible with Python 2.


I recently run the entire test suite in Alpine with python 3.8.0 
(actually a 'python' symlink

pointing to python3), including the bootstrap script, with success.

Re: How to send a POST request by wget same to a httpie request?

2020-07-21 Thread Ander Juaristi


You can send POST requests, with --method=POST.

Multipart bodies are not, and will probably never be, supported in wget.

There is ongoing work to implementing them in wget2, however [0]. You 
might want to check it out.


Having said this, in wget you can send the contents of a file as a body 
with the --body-file option. This is not the same as a multipart body, 
but might fit your use case. What --body-file does is send the file's 
contents directly in the body of the HTTP request. There is also 
--post-file, which is basically --method=POST + --body-file.


Cheers,
- AJ

[0] https://gitlab.com/gnuwget/wget2


El 2020-07-02 00:32, Peng Yu escribió:

$ http --form POST localhost:9000 f...@1.txt

The above httpie (https://httpie.org/) command will send the following
POST request. Could anybody let me know what is the equivalent wget
command to achieve the same HTTP request? Thanks.


POST / HTTP/1.1
Host: localhost:9000
User-Agent: HTTPie/2.2.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive
Content-Length: 171
Content-Type: multipart/form-data; 
boundary=36922889709f11dcba960da4b9d51a2e


--36922889709f11dcba960da4b9d51a2e
Content-Disposition: form-data; name="file"; filename="1.txt"
Content-Type: text/plain

abc

--36922889709f11dcba960da4b9d51a2e--

[bug #58097] Wget doesn't download intermediate certificates when not supplied in the response

2020-05-06 Thread Ander Juaristi

Follow-up Comment #2, bug #58097 (project wget):

Just for the record, I think I'm experiencing this same issue implementing the
OpenSSL backend in wget2. As soon as I discover how to solve it there I could
back-port the fix to wget.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/

Re: [Bug-wget] Pha support for tls1.3

2019-05-24 Thread Ander Juaristi


Hi Tim,

Looks good. Could you merge it please?

Thanks
- AJ

On 23/3/19 18:04, Tim Rühsen wrote:

Thank you Daniel and Diresh.

I don't think we should send the post handshake extension in case no
client certificate is given.

The OpenSSL documentation is pretty silent about what happens when a
server requests a post handshake. What I found is that some kind of
callback function is mentioned, but I didn't find an example on a quick
glance.

I add Ander Juaristi, since he promised to maintain the OpenSSL code of
Wget until the end of his life, hehe ;-)

Regards, Tim

On 23.03.19 10:20, a...@cyberstorm.mu wrote:

Hello all,

A re-work was done on the patch as Daniel suggested.

Please find the updated gist in the link below:
https://gist.github.com/AviSoomirtee/22c1b698c796177d836323ef506665a5

Could you provide a feedback about the change.
Thanks.

Regards,
Diresh Soomirtee.

On Friday, March 22, 2019 22:23 CET, Daniel Stenberg  wrote:
  

On Fri, 22 Mar 2019, Tim Rühsen wrote:


Are you sure that '#ifdef SSL_CTX_set_post_handshake_auth' works ?

Here with

OpenSSL 1.1.1b it seems that 'SSL_CTX_set_post_handshake_auth' is a

function

and not a #define.


In curl we use this #ifdef magic for figuring out if the function is
present:

#if ((OPENSSL_VERSION_NUMBER >= 0x10101000L) && \
!defined(LIBRESSL_VERSION_NUMBER) && \
!defined(OPENSSL_IS_BORINGSSL))
#define HAVE_SSL_CTX_SET_POST_HANDSHAKE_AUTH
#endif

--

/ daniel.haxx.se

Re: [Bug-wget] Pha support for tls1.3

2019-04-30 Thread Ander Juaristi


Hi,

Let me test this against gnutls-cli (with --post-handshake-auth),
and I'll tell you something ASAP. It's a short patch, shouldn't be too 
hard to verify this.


On 23/3/19 18:04, Tim Rühsen wrote:

Thank you Daniel and Diresh.

I don't think we should send the post handshake extension in case no
client certificate is given.


Neither do I. It's basically nonsense. But I don't know if OpenSSL is 
smart enough in this regard.

Re: [Bug-wget] Docs missing info on ca_directory and ca_certfile

2019-01-03 Thread Ander Juaristi

Hi,

The patch looks good to me. As Tim says, I would also pass NULL as the
second param in line 20.  If we provide --ca-directory what would happen
is that OpenSSL will pick up the most suitable certificate from the
directory based on the hash value of the name, and some other field I
don't remember. GnuTLS will consider all of them. In the end it's the
same behavior.

Tim, could you merge the patch?

On 29/12/18 17:54, Jeffrey Walton wrote:
> On Sat, Dec 29, 2018 at 11:43 AM Tim Rühsen  wrote:
>>
>> On 29.12.18 05:00, Jeffrey Walton wrote:
>>> On Fri, Dec 28, 2018 at 10:07 PM Jeffrey Walton  wrote:

 The sample wgetrc is missing info on ca_directory . Also see
 https://www.gnu.org/software/wget/manual/html_node/Sample-Wgetrc.html.

 I also cannot figure out how to tell Wget to use cacert.pem. I've
 tried ca_cert, ca_certs and ca_certfile but it produces:

 wget: Unknown command ‘ca_file’ in /opt/bootstrap/etc/wgetrc at line 
 141
 Parsing system wgetrc file failed.
>>>
>>> My bad... I found it. openssl.c used "opt.ca_cert", so I was trying to
>>> use the same in rc file. The correct name is ca_certificate.
>>
>> There are some inconsistencies with the naming in rc files and on the
>> command line. We do not have this any more with wget2.
>>
>>> Tim, you may want this when Wget is built against OpenSSL. It makes
>>> Wget/OpenSSL behave like Wget/GnuTLS:
>>> https://github.com/noloader/Build-Scripts/blob/master/bootstrap/wget.patch
>>
>> Thanks for the pointer.
>>
>> On L20 the second param to SSL_CTX_load_verify_locations can be NULL.
>>
>> I personally don't care much for OpenSSL - I put Ander on CC.
> 
> Yeah, understood.
> 
> The problem I'm facing is I need a working Wget quickly. Trying to
> build GnuTLS from sources is too heavy weight at this point in the
> process. I can do it later, but I need the lightweight version
> immediately.
> 
> The patch tested OK on Linux back to Fedora 1 with GCC 3. I've still
> got AIX, OS X, Solaris and some other testing to do.
> 
> Jeff
> 

pEpkey.asc
Description: application/pgp-keys

Re: [Bug-wget] Feature request

2018-10-07 Thread Ander Juaristi


Hi,

Most of the new features are now better shipped in wget2 and libwget.

Development is done in GitLab: https://gitlab.com/gnuwget/wget2
Docs: https://gnuwget.gitlab.io/wget2/reference/modules.html

I don't know if that particular feature would fit into wget2, but a 
generalisation
of it might be a good addition. I'm thinking on some functions in 
libwget to set
the max/min download speed, or something along the lines. There aren't 
currently

any such controls and they'd be a good addition IMO.

In any case, we're always pleased to receive MRs in the GitLab 
repository

for discussion.

- AJ


Hello maintainers,
I wrote this script in python which restarts the wget process if the 
speed

hits a particular minimum set by the user.
https://github.com/plant99/better-wget
Though it needs a little tidying up, I would love to add this as a 
feature

to original wget code. Please guide me on this.
Best Regards,
Shivashis Padhi

[Bug-wget] Some SSL/TLS engine improvements

2018-05-03 Thread Ander Juaristi

The attached patches introduce the --ciphers command line option and 
enhance the SSL/TLS engine in some ways.


Suggested by Jeffrey Walton some time ago ;)

Regards,
- AJ
>From 1d785fb7d138a2a5c5542ec76aaa33a13122e150 Mon Sep 17 00:00:00 2001
From: Ander Juaristi 
Date: Sat, 28 Apr 2018 14:06:34 +0200
Subject: [PATCH 1/3] Enhance SSL/TLS security

This commit hardens SSL/TLS a bit more in the following ways:

 * Explicitly exclude NULL authentication and the 'MEDIUM' cipher list
   category. Ciphers in the 'HIGH' level are only considered - this
   includes all symmetric ciphers with key lengths larger than 128 bits,
   and some ('modern') 128-bit ciphers, such as AES in GCM mode.
 * Allow RSA key exchange by default, but exclude it when
   Perfect Forward Secrecy is desired (with --secure-protocol=PFS).
 * Introduce new option --ciphers to set the cipher list that the SSL/TLS
   engine will favor. This string is fed directly to the underlying TLS
   library (GnuTLS or OpenSSL) without further processing, and hence its
   format and syntax are directly dependent on the specific library.

Reported-by: Jeffrey Walton 

---
 src/gnutls.c  | 78 ++-
 src/init.c|  1 +
 src/main.c|  5 
 src/openssl.c | 26 +---
 src/options.h |  2 ++
 5 files changed, 81 insertions(+), 31 deletions(-)

diff --git a/src/gnutls.c b/src/gnutls.c
index 0fd8da8c..0368b4a4 100644
--- a/src/gnutls.c
+++ b/src/gnutls.c
@@ -535,35 +535,10 @@ _sni_hostname(const char *hostname)
   return sni_hostname;
 }
 
-bool
-ssl_connect_wget (int fd, const char *hostname, int *continue_session)
+static int
+set_prio_default (gnutls_session_t session)
 {
-  struct wgnutls_transport_context *ctx;
-  gnutls_session_t session;
-  int err;
-
-  gnutls_init (&session, GNUTLS_CLIENT);
-
-  /* We set the server name but only if it's not an IP address. */
-  if (! is_valid_ip_address (hostname))
-{
-  /* GnuTLS 3.4.x (x<=10) disrespects the length parameter, we have to construct a new string */
-  /* see https://gitlab.com/gnutls/gnutls/issues/78 */
-  const char *sni_hostname = _sni_hostname(hostname);
-
-  gnutls_server_name_set (session, GNUTLS_NAME_DNS, sni_hostname, strlen(sni_hostname));
-  xfree(sni_hostname);
-}
-
-  gnutls_credentials_set (session, GNUTLS_CRD_CERTIFICATE, credentials);
-#ifndef FD_TO_SOCKET
-# define FD_TO_SOCKET(X) (X)
-#endif
-#ifdef HAVE_INTPTR_T
-  gnutls_transport_set_ptr (session, (gnutls_transport_ptr_t) (intptr_t) FD_TO_SOCKET (fd));
-#else
-  gnutls_transport_set_ptr (session, (gnutls_transport_ptr_t) FD_TO_SOCKET (fd));
-#endif
+  int err = -1;
 
 #if HAVE_GNUTLS_PRIORITY_SET_DIRECT
   switch (opt.secure_protocol)
@@ -642,6 +617,53 @@ ssl_connect_wget (int fd, const char *hostname, int *continue_session)
 }
 #endif
 
+  return err;
+}
+
+bool
+ssl_connect_wget (int fd, const char *hostname, int *continue_session)
+{
+  struct wgnutls_transport_context *ctx;
+  gnutls_session_t session;
+  int err;
+
+  gnutls_init (&session, GNUTLS_CLIENT);
+
+  /* We set the server name but only if it's not an IP address. */
+  if (! is_valid_ip_address (hostname))
+{
+  /* GnuTLS 3.4.x (x<=10) disrespects the length parameter, we have to construct a new string */
+  /* see https://gitlab.com/gnutls/gnutls/issues/78 */
+  const char *sni_hostname = _sni_hostname(hostname);
+
+  gnutls_server_name_set (session, GNUTLS_NAME_DNS, sni_hostname, strlen(sni_hostname));
+  xfree(sni_hostname);
+}
+
+  gnutls_credentials_set (session, GNUTLS_CRD_CERTIFICATE, credentials);
+#ifndef FD_TO_SOCKET
+# define FD_TO_SOCKET(X) (X)
+#endif
+#ifdef HAVE_INTPTR_T
+  gnutls_transport_set_ptr (session, (gnutls_transport_ptr_t) (intptr_t) FD_TO_SOCKET (fd));
+#else
+  gnutls_transport_set_ptr (session, (gnutls_transport_ptr_t) FD_TO_SOCKET (fd));
+#endif
+
+  if (!opt.tls_ciphers_string)
+{
+  err = set_prio_default (session);
+}
+  else
+{
+#if HAVE_GNUTLS_PRIORITY_SET_DIRECT
+  err = gnutls_priority_set_direct (session, opt.tls_ciphers_string, NULL);
+#else
+  logprintf (LOG_NOTQUIET, _("GnuTLS: Cannot set prio string directly. Falling back to default priority.\n"));
+  err = gnutls_set_default_priority ();
+#endif
+}
+
   if (err < 0)
 {
   logprintf (LOG_NOTQUIET, "GnuTLS: %s\n", gnutls_strerror (err));
diff --git a/src/init.c b/src/init.c
index e4186abe..d7e6a723 100644
--- a/src/init.c
+++ b/src/init.c
@@ -280,6 +280,7 @@ static const struct {
 #endif
   { "preservepermissions", &opt.preserve_perm,  cmd_boolean },
 #ifdef HAVE_SSL
+  { "ciphers",  &opt.tls_ciphers_string, cmd_string },
   { "privatekey",   &opt.private_key,   cmd_file },
   { "privatekeytype",   &opt.private_key_type,  cmd_cert_type },
 #endif

Re: [Bug-wget] How to intercept wget to extract the raw requests and the raw responses?

2018-02-14 Thread Ander Juaristi

Sort of... Would need further refinement.

sudo tcpdump -i  -A 'port http'

Regards,

- AJ


> Enviar: domingo 11 de febrero de 2018 a las 0:35
> De: "Peng Yu" 
> Para: "Tim Ruehsen" 
> CC: bug-wget@gnu.org
> Asunto: Re: [Bug-wget] How to intercept wget to extract the raw requests and 
> the raw responses?
>
> On Sat, Feb 10, 2018 at 3:46 PM, Tim Ruehsen  wrote:
> > Am Samstag, den 10.02.2018, 10:34 -0600 schrieb Peng Yu:
> >> > Use 'wget -d -olog -qO- http://httpbin.org/get'.
> >>
> >> This requires the compilation of wget with debugging support. Is
> >> there
> >> any other way so that it does not require recompilation of wget?
> >> Thanks.
> >
> > Packet sniffing with e.g. tcpdump/wireshark
> 
> Could you show me some working examples about this?
> 
> > or using a local proxy.
> 
> Which one do you recommend that can give me the raw requests and the
> raw responses out-of-the-box? I have tried several proxies. None of
> them does so out-of-the-box.
> 
> -- 
> Regards,
> Peng
> 
>

[Bug-wget] [bug #51666] Please hash the hostname in ~/.wget-hsts files

2017-08-18 Thread Ander Juaristi

Follow-up Comment #3, bug #51666 (project wget):

> We can do both, hash and still keep the readable to the user only

... hash and still keep the _files_ readable ...



___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/

[Bug-wget] [bug #51666] Please hash the hostname in ~/.wget-hsts files

2017-08-18 Thread Ander Juaristi

Follow-up Comment #2, bug #51666 (project wget):

I'm not generally against these kind of small tweaks that don't harm and
slightly improve user's privacy.

If Firefox doesn't do it, we don't care: it's their business and they will end
up doing it if users request that feature (maybe because they saw it in
wget).

Private SSH keys can be protected with a password if you want to.

> While we could hash anything, it would be way safer for you to protect your
complete home directory

We can do both, hash and still keep the readable to the user only. If the
overhead is not much I would go for it. That is the basis of every security
framework out there: if the benefits of having 2 security mechanisms instead
of only 1 outweigh the drawbacks, then implement 2 instead of 1.



___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/

Re: [Bug-wget] wget --secure-protocol=SSLv2 --certificate=/home/www/html/paj/key2.pem --certificate-type=PEM https://85.133.186.11:7878/ipgapp/services/IPGService?wsd

2017-08-18 Thread Ander Juaristi

I'm sure you'll understand it's difficult to troubleshoot that from our side
without having a copy of your client certificate.

I've visited the URL and *I think* the site issues a non-standard root (CA)
certificate, so if that's the case you would need to tell wget to accept that
cert as a CA. You do that with --ca-certificate.

However, from the error messages it seems more likely that either the cert file
you're using is invalid, malformed or you entered an incorrect password.

On 06/08/17 16:50, Nasrollah Mohammadi wrote:
> wget --secure-protocol=SSLv2 --certificate=/home/www/html/paj/key2.pem
>--certificate-type=PEM
>https://85.133.186.11:7878/ipgapp/services/IPGService?wsdl
>--2017-08-06 22:22:07--
>https://85.133.186.11:7878/ipgapp/services/IPGService?wsdl
>Enter PEM pass phrase:
>OpenSSL: error:06065064:digital envelope
>routines:EVP_DecryptFinal_ex:bad decrypt
>OpenSSL: error:23077074:PKCS12 routines:PKCS12_pbe_crypt:pkcs12
>cipherfinal error
>OpenSSL: error:2306A075:PKCS12 routines:PKCS12_item_decrypt_d2i:pkcs12
>pbe crypt error
>OpenSSL: error:0907B00D:PEM routines:PEM_READ_BIO_PRIVATEKEY:ASN1 lib
>OpenSSL: error:140B0009:SSL routines:SSL_CTX_use_PrivateKey_file:PEM
>lib
>Disabling SSL due to encountered errors.
>

Re: [Bug-wget] Shouldn't wget strip leading spaces from a URL?

2017-06-13 Thread Ander Juaristi

On 13/06/17 02:09, L A Walsh wrote:
> 
> 
> Dale R. Worley wrote:
>> L A Walsh  writes:
>>> W/cut+paste into target line, where URL is double-quoted.  More often
>>> than not, I find it safer to double-quote a URL than not, because, for
>>> example, shells react badly to embedded spaces, ampersands and
>>> question marks.
>>
>> But of course, no URL contains an embedded space.
> ---
> Why not?
> 
>John Mueller of Google posted a note about spaces in the URL on Google+.   
> You know, the URLs that look like www.domain.com/file name goes here.html.
> 
>Should you fill those holes?
> 
>John Mueller of Google said "the answer is not "no"" when it comesto 
> the
> question "Should you encode spaces in URLs as "%20", "+" or
>as a space (" ")?"
> 

But those are correctly handled by wget.

I guess the whole point of this thread is to handle leading (and probably also
trailing) spaces. And the easiest way to 'handle' them is to remove them. So we
need a patch that trims the URL, as Tim said (I think), shouldn't be hard.

Other than that, there's nothing more to worry about, is it? Spaces within a
path/query are already correctly dealt with, and spaces in any other place
(htt ps://... !!!) are probably incorrect and should be reported as such.

> 
> But what would someone at google know?
> 
> 
> 
>

Re: [Bug-wget] Shouldn't wget strip leading spaces from a URL?

2017-06-12 Thread Ander Juaristi

Your shell strips all the additional spaces between command-line arguments, so
it's effectively like a browser ;)

How are you running wget?

On 06/06/17 01:12, L A Walsh wrote:
> if wget gets leading spaces in a URL, it complains:
>  "  http://www.kernel.org/pub/linux/utils/util-linux/v2.30: Scheme missing."
> 
> Isn't it required for a web client to strip leading spaces from
> URLs?
> 
> 
>

Re: [Bug-wget] HTTPS Python tests fail if localhost resolves to ::1

2017-06-03 Thread Ander Juaristi

Hi Tomas,

In Debian, where I work, localhost has long been mapped to both 127.0.0.1 and
::1 as well. At least from early 2015, when I started to get involved with wget
for the first time. And those tests have always passed in my machine.

I'm sorry I have no clue why is that happening to you, but you could try adding
-4 to the command line to see if it's really related to that.

On 02/06/17 11:38, Tomas Hozza wrote:
> Hi.
> 
> In Fedora 26+ the /etc/hosts lists "localhost" as a domain for both 127.0.0.1 
> and ::1. This makes wget's testsuite to fail during build.
> 
> Failing tests:
> Test--https.py
> Test-pinnedpubkey-der-https.py
> Test-pinnedpubkey-hash-https.py
> Test-pinnedpubkey-pem-https.py
> 
> 
> [root@b28dfb71db2e sources]# cat testenv/Test--https.log 
> Python runtime initialized with LC_CTYPE=C (a locale with default ASCII 
> encoding), which may cause Unicode compatibility problems. Using C.UTF-8, 
> C.utf8, or UTF-8 (if available) as alternative Unicode-compatible locales is 
> recommended.
> Setting --no-config (noconfig) to 1
> Setting --ca-certificate (cacertificate) to /sources/testenv/certs/ca-cert.pem
> DEBUG output created by Wget 1.19.1.68-5d4ad on linux-gnu.
> 
> Reading HSTS entries from /sources/testenv/Test--https.py-test/.wget-hsts
> URI encoding = 'ANSI_X3.4-1968'
> converted 'https://127.0.0.1:4/File1' (ANSI_X3.4-1968) -> 
> 'https://127.0.0.1:4/File1' (UTF-8)
> Converted file name 'File1' (UTF-8) -> 'File1' (ANSI_X3.4-1968)
> --2017-06-02 09:28:34--  https://127.0.0.1:4/File1
> Loaded CA certificate '/sources/testenv/certs/ca-cert.pem'
> Certificates loaded: 1
> Connecting to 127.0.0.1:4... connected.
> Created socket 3.
> Releasing 0x024ee050 (new refcount 0).
> Deleting unused 0x024ee050.
> The certificate's owner does not match hostname '127.0.0.1'
> URI encoding = 'ANSI_X3.4-1968'
> converted 'https://127.0.0.1:4/File2' (ANSI_X3.4-1968) -> 
> 'https://127.0.0.1:4/File2' (UTF-8)
> Converted file name 'File2' (UTF-8) -> 'File2' (ANSI_X3.4-1968)
> --2017-06-02 09:28:34--  https://127.0.0.1:4/File2
> Connecting to 127.0.0.1:4... connected.
> Created socket 3.
> Releasing 0x0278f1d0 (new refcount 0).
> Deleting unused 0x0278f1d0.
> The certificate's owner does not match hostname '127.0.0.1'
> Running Test Test--https.py
> /sources/src/wget --debug --no-config 
> --ca-certificate=/sources/testenv/certs/ca-cert.pem 
> https://127.0.0.1:4/File1 https://127.0.0.1:4/File2 
> ['/sources/src/wget', '--debug', '--no-config', 
> '--ca-certificate=/sources/testenv/certs/ca-cert.pem', 
> 'https://127.0.0.1:4/File1', 'https://127.0.0.1:4/File2']
> Error: Expected file File1 not found..
> Traceback (most recent call last):
>   File "./Test--https.py", line 53, in 
> protocols=Servers
>   File "/sources/testenv/test/http_test.py", line 41, in begin
> self.do_test()
>   File "/sources/testenv/test/base_test.py", line 187, in do_test
> self.post_hook_call()
>   File "/sources/testenv/test/base_test.py", line 206, in post_hook_call
> self.hook_call(self.post_configs, 'Post Test Function')
>   File "/sources/testenv/test/base_test.py", line 196, in hook_call
> conf.find_conf(conf_name)(conf_arg)(self)
>   File "/sources/testenv/conf/expected_files.py", line 54, in __call__
> raise TestFailed('Expected file %s not found.' % file.name)
> exc.test_failed.TestFailed: Expected file File1 not found.
> FAIL Test--https.py (exit status: 1)
> 
> I didn't have time to investigate this thoroughly yet, but I thought I'll let 
> you know in case the issue will be obvious to anyone. I suspect that there 
> will be a mismatch between the address on which the HTTPS server runs and the 
> data in the certificate it uses.
> 
> Regards,
> Tomas
>

Re: [Bug-wget] [GSoC] Git & Github workflow

2017-04-02 Thread Ander Juaristi

Hi,

This is certainly a really good contibution, and often overlooked.

I've made a couple of revisions [0].

Regards,
- AJ

[0]
https://github.com/rockdaboot/wget2/wiki/Contributing-guide-for-GSoC-students/_history

On 02/04/17 15:31, Avinash Sonawane wrote:
> On Thu, Mar 23, 2017 at 9:08 PM, Avinash Sonawane  wrote:
> 
>> If we get helpful responses, I will add this "contributing guide for
>> GSoC students" to our wiki.
> 
> As promised earlier, I just added a "contributing guide for GSoC
> students" to our wiki.[0]
> 
> I hope this helps GSoC aspiring students to not hold back just because
> they weren't comfortable with git/Github. :)
> 
> It's still far from being perfect. So I invite your contributions.
> 
> Finally, I would like to thank Tim for his inputs. Thank you!
> 
> [0]https://github.com/rockdaboot/wget2/wiki/Contributing-guide-for-GSoC-students
> 
> Regards,
> Avinash Sonawane (rootKea)
> PICT, Pune
> https://rootkea.wordpress.com
>

Re: [Bug-wget] [GSoC] Proposal | Design and Implementation of Test Suite Using Libmicrohttpd

2017-04-01 Thread Ander Juaristi

Hi there Didik,

I added some comments to the doc. Hope you can have some time to look at
them.

Best regards,
- AJ

On 30/03/17 21:25, Didik Setiawan wrote:
> Hi!
> 
> I have prepared a proposal for Google Summer of Code 2017. I make decision to 
> change my project from my initial subject intention, considering that 
> currently 
> I working on small patch covering the area of test suite which make me more
> familiar with Wget2 codebase. 
> I know this is a bit late, but please try to give your comments and 
> suggestions.
> 
> Here is a Google doc for my proposal.
> https://docs.google.com/document/d/1CHy-8_yKVYrBmkC9i4Sb7shpd1feec3Dqtp2WI2KY9M/edit?usp=sharing
> 
> Regards,
> Didik Setiawan
> 
> 



signature.asc
Description: OpenPGP digital signature

Re: [Bug-wget] [GSOC 2017] make check fails almost all the tests on fresh clone of wget2

2017-03-22 Thread Ander Juaristi

Did you pass any flags or env vars to configure?


On 22/03/17 07:09, Avinash Sonawane wrote:
> Hello!
> 
> As I have mentioned earlier I have built and installed wget2 on my
> system. But I hadn't performed `male check` since I wasn't about to
> write unit tests.
> 
> But now as I am extending tests/test-parse-html.c to include new unit
> tests I did `make check`.
> 
> Now there are two problems:
> 1) make check is utterly slow. I can see my CPU usage going up to
> 1-2%. That's it! It's almost an hour or and it's still running with
> this output:
> FAIL: test
> FAIL: test-wget-1
> FAIL: test-restrict-ascii
> FAIL: test-i-http
> FAIL: test-i-https
> FAIL: test-np
> FAIL: test--spider-r
> FAIL: test-meta-robots
> FAIL: test-idn-robots
> FAIL: test-idn-meta
> FAIL: test-idn-cmd
> FAIL: test-iri
> FAIL: test-iri-percent
> FAIL: test-iri-list
> FAIL: test-iri-forced-remote
> FAIL: test-auth-basic
> PASS: test-parse-html
> ...
> 
> How much time does it take usually? I haven't configured to use Valgrind yet.
> 
> 2) All the tests before test-parse-html have failed and it's still
> running so have no idea how many more will fail. Let me remind you I'm
> performing `make check` on fresh clone and haven't modified it in any
> manner. Is this an expected behavior? What does FAIL tests mean? Can
> we do something about them?
> 
> Any help will be deeply appreciated.
> 
> Thank you.
> 
> Regards,
> Avinash Sonawane (rootKea)
> PICT, Pune
> https://rootkea.wordpress.com
> 



signature.asc
Description: OpenPGP digital signature

Re: [Bug-wget] GSoC Project | Design and Implementation of a Framework for Plugins

2017-03-20 Thread Ander Juaristi

Hi all,

On 20/03/17 16:15, Tim Ruehsen wrote:
> One goal would be to make up data structures and an API that can be extended 
> without breaking compatibility between  wget2 and the plugin in the future. 
> E.g. a newer wget2 should still be able to work with an older plugin and vice 
> versa. 
> 

I would add something to this.

Maybe this is not 100% related to the plugin framework, but we need a
consistent API to manipulate all stages of an HTTP request (connection
establishment, SSL/TLS handshake, SSL/TLS cert verification, etc.) in a
single place, as I once told Tim off-list.

Have a look at #142 [0], for instance. Although it mentions HPKP, there
are other parts that would benefit from this as well.

I was thinking of an event-based framework (e.g. plugins subscribe to
'events' and get called when they happen). Tim, on the other hand,
prefers a data structure where you could access all the details of a
connection (e.g. something like 'wget_connection_t', with references to
'wget_tls_t', etc.), and this data structure gets passed to plugins and
they decide what to do.

[0] https://github.com/rockdaboot/wget2/issues/142

> Just make up a sketch of your ideas and we (you and one of our mentors) will 
> improve/discuss that.
> 
> Regards, Tim
> 

signature.asc
Description: OpenPGP digital signature

Re: [Bug-wget] Vulnerability Report - CRLF Injection in Wget Host Part

2017-03-07 Thread Ander Juaristi

Hi Dale,

On 06/03/17 16:47, Dale R. Worley wrote:
> Orange Tsai  writes:
>> # This will work
>> $ wget 'http://127.0.0.1%0d%0aCookie%3a hi%0a/'
> 
> Not even considering the effect on headers, it's surprising that wget
> doesn't produce an immediate error, since
> "127.0.0.1%0d%0aCookie%3a hi%0a" is syntactically invalid as a host
> part.  Why doesn't wget's URL parser detect that?

Simply because it first splits the URL into several parts according to
the delimiters, and then decodes the percent-encoding.

Additionally for the host part it also checks whether it's an IP address
and the IDNA stuff, but yeah you raise a good point. Other than that the
host part is treated similarly to the other parts.

So all in a rush I see RFC 1034 says a domain name should have "any one
of the 52 alphabetic characters A through Z in upper case and a through
z in lower case", and digits, basically.

Do you think it's enough to just blacklist anything outside
[a-z0-9\.\-_], or is there something else to be done?

> I'm sure the new
> patch is an improvement, but it's surprising that the old code didn't
> detect that was an invalid URL anyway, since it contains characters that
> aren't permitted in those locations.
> 
> Dale
> 

signature.asc
Description: OpenPGP digital signature

Re: [Bug-wget] How to download Wget software in windows?

2017-03-04 Thread Ander Juaristi

https://eternallybored.org/misc/wget/

On 03/03/17 00:57, Qu Bo wrote:
> Would be helpful to give me a guidence. Many thanks！
> Bo
> 
> 发自 Windows 邮件
>

[Bug-wget] [bug #50250] not reading ~/.netrc anymore

2017-03-01 Thread Ander Juaristi

Follow-up Comment #1, bug #50250 (project wget):

Does d061e55 [1] fix this?

[1]
http://git.savannah.gnu.org/cgit/wget.git/commit/?id=d061e553a184871b383649c1ea8c921c62164905

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/

Re: [Bug-wget] ot: clicking email links advice

2017-01-05 Thread Ander Juaristi

Hi,

On 28/12/16 05:57, voy...@sbt.net.au wrote:
> 
> is there a way to run wget with that url and, tell it to 'press' one of
> the buttons?

Not directly as you describe. Wget does not submit web forms.

You would need to write an external application to parse the HTML,
generate the target link and then feed that link to wget.

Or,

You could use wget2, which we're designing as a library, although it's
still in pre-alpha. It has functions to extract links from an HTML
document (see the example in [1]), although I don't know if it can
extract URLs from  fields as well, which is what I
guess you need. Maybe @Tim can give more details on this.

[1]
https://github.com/rockdaboot/wget2/blob/master/examples/print_html_urls.c

> 
> thanks for any pointer or advice
> 
> V
> 
> 

signature.asc
Description: OpenPGP digital signature

Re: [Bug-wget] Favicon is not downloaded (Suggestion for improvement)

2017-01-04 Thread Ander Juaristi

Yeah, why not. I can't think of any reason why it would break current
wget deployments.

I know Gecko does that: it asks for /favicon.ico just after / if no
favicon was defined in the HTML.

On 25/12/16 15:42, Павел Серегов wrote:
> Hi.
> 
> Often not exist code for favicon (in index.html), but site have.
> 
> My suggestion:
> If use wget -m, need make download  http://example.com/favicon.ico
> 
> How do you like the idea?
> 

signature.asc
Description: OpenPGP digital signature

Re: [Bug-wget] Wget wall clock time is very high

2016-12-14 Thread Ander Juaristi

Hi there,

On 14/12/16 12:10, Debopam Bhattacherjee wrote:
> Hi,
> 
> I try to download a webpage along with it dependencies using the following
> command:
> wget -p -k -H -e robots=off --header="Accept:
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
> --user-agent="Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0)
> Gecko/20100101 Firefox/50.0" -P test1234 http://www.stanford.edu
> 
> The total download time is 1.4 seconds while the wall clock time is 6.8
> seconds which is much higher. Chrome, in comparison downloads and renders
> everythin in 2-3 seconds.
> 
> Why is the wall clock time so high and how can it be reduced?

Probably because Chrome spreads download jobs across several threads,
while wget does it all in the same thread.

> 
> Thanks!
> 

Regards,
- AJ



signature.asc
Description: OpenPGP digital signature

Re: [Bug-wget] wgetrc: "no-hsts" fails

2016-11-29 Thread Ander Juaristi

Hi,

On 29/11/16 22:26, ilf wrote:
> Tim Rühsen:
>>> --no-hsts works as an interactive option, but not in wgetrc.
>> hsts=0
> 
> Thanks, that works!
> 
> 1. Why is this hsts=0 instead of hsts=off? Are there more possible
> values than boolean?
> 

hsts=off and hsts=no work as well

> 2. All other options I use are identical arguments on command-line and
> in wgetrc. Why is this different? How about adding a note to the
> man-page like "use --no-hsts or hsts=0"?

What to you mean? To my understanding, the HSTS options are not
different from the others.

It's the same pattern as with the rest of the options:

--hsts: enable HSTS (enabled by default so no need to specify it)
--no-hsts: disable HSTS

And in .wgetrc:

hsts=1/on/yes: same as --hsts
hsts=0/off/no: same as --no-hsts

This is all explained in the wget(1) man page (doc/wget.texi in the
source tree), in the section "Option syntax".

> 
> Thanks, and keep up the good work!
> 


0xD2438B33.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature

[Bug-wget] [bug #49458] Please search txt files named ".ram" for links

2016-10-27 Thread Ander Juaristi

Follow-up Comment #1, bug #49458 (project wget):

I don't exactly know what you're asking, but I don't think it would be a good
fit for wget given that you could easily automate it with a shell script.

The closest enhancement I can come across is to make -i accept multiple files,
which, to the best of my knowledge, does not yet.

E.g. wget -i file1.ram,file2.ram,foo.txt,bar.html

What do you all think?

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/

Re: [Bug-wget] C developer keen to help

2016-10-02 Thread Ander Juaristi

Sure! You're more than welcome to join us!

If you're looking for something to work on, you could browse the list of
open bugs [1], take one that suits you, and drop us a line, so that we
know someone is working on it. When you're working on a particular
issue, also please write a test case in the end to ensure it doesn't
appear anymore in future releases. Test cases are written in Python.
They are in the folder testenv/.

We also have unit tests (well, sort of). You can write C code inside a
#define TESTING macro that tests each function individually. Take a look
at hsts.c:608 [2], for instance. You run the tests with 'make check'. It
will run all the tests under testenv/, all the "unit tests", and the old
test suite under test/.

Another project that's still pending is the migration of the old test
suite (tests/), written in Perl to the new scheme in Python. You could
also give it a try if you like. Although this will arguably take more time.

If you get stuck or have any doubt in anything, please get in touch.
We're friendly (well, usually ;D).

[1] https://savannah.gnu.org/bugs/?group=wget
[2] http://git.savannah.gnu.org/cgit/wget.git/tree/src/hsts.c#n608

On 02/10/16 11:28, Piotr Zacharzewski wrote:
> Hi all,
> 
> My name is Piotr Zacharzewski, I am searching for a free software
> project to get involved in. You can read more about me on
> https://savannah.gnu.org/users/pzach. I must say I've never contributed
> to a project this way, although I have experience in collaborating with
> people on software projects. I have around 2 years of practical C
> experience but only very basic TCP/IP knowledge from classes. I tend to
> learn fast though and I have some spare time in the evenings, weekends
> regularly. Do you think I could join in and help out?
> 
> kind regards
> 
> Peter
> 
>

Re: [Bug-wget] WordPress downloaded site

2016-09-18 Thread Ander Juaristi

Hi Michael,

On 13/09/16 13:54, Michael wrote:
> 
> Hello wget people!
> 
> I have downloaded my WordPress site using wget and it works like a charm.
> (thank you)!
> 
> As WordPress uses directory names as meaningful file names, wget does create
> a directory with  and inserts there index.html with correct
> relative directory path. (../).
> 
> I want that this behavior will be kept for the top directory only and when
> wget recourse, it will create .html files out of it. (With
> proper links in the site of course).
> 

So, just to clarify whether I understood you well, you want this:

/
`- foo.html
`- bar.html

Instead of this:

/
`- foo/
   `- index.html
`- bar/
   `- index.html

Right?

> Is this feature exists?
>

Not to the best of my knowledge.

> If not, how complicated it is to program it?

It shouldn't take more than a week, roughly, to someone who is not
familiarized with the code and doesn't work on wget full-time.

The best way of estimating how this could be done, is to do it yourself
and then share your patch with us. That way, you also provide the
community with something tangible, and we can more easily evaluate
whether we want that feature in wget.

> Best regards,
> 
> Michael
> 

signature.asc
Description: OpenPGP digital signature

Re: [Bug-wget] mirror delete missing files

2016-08-24 Thread Ander Juaristi

Hi,

On 23/08/16 14:23, Darshit Shah wrote:
> Hi,
> 
> Nope, as far as I'm aware, there is absolutely no plan to add such a
> feature to Wget. Wget is primarily a tool for downloading web resources,
> not for maintaining a sync. Sure, you can use it to archive a website,
> to keep it in sync, there are other, more specialised utilities such as
> `rsync`.
> 
> However, if someone were to contribute a patch, along with relevant test
> cases, we might consider such a feature.

Just dropping an idea.

Sure there is demand for such a feature out there, but my inner voices
keep on telling me that we should not go too far from what wget was
intended to be - as you say, a tool to download web resources.

How about a feature that logs (in a CSV, for instance) which files were
downloaded, and where they were saved? Then you could easily parse that
file with something like Python.

Jookia took a similar approach to log URLs that were rejected [1].

[1] http://lists.gnu.org/archive/html/bug-wget/2015-07/msg00094.html

> * 48mgwi+3jphxhave2...@guerrillamail.com
> <48mgwi+3jphxhave2...@guerrillamail.com> [160823 11:18]:
>> Hi all,
>>
>>I'm using wget to mirror (recursively) a remote directory over
>> http. I think wget is very nice for this job, but unfortunately it
>> lacks an option to delete files (locally) that were downloaded earlier
>> but are no longer available on the remote server. Right now wget just
>> ignores these files and leaves the local copy unmodified. Is there any
>> chance that this option will be added to wget? I've found post online
>> by people with this very same problem since at least 2004...
>>
>> Thanks
> 



signature.asc
Description: OpenPGP digital signature

Re: [Bug-wget] Wget - acess list bypass / race condition PoC

2016-08-17 Thread Ander Juaristi

I was thinking we could rename php extensions to phps, but it's all the
same thing in the end, and even better, since the former applies to any
kind of file and I've seen some broken servers that actually run phps files.

So, this is what I would do:

1. Write temporary files with 600 perms, and make sure they're owned
by the running user and group. qmail goes even further [1] by not
letting root run, but I would not do that here.
2. Use mkostemp() to generate a unique filename and give it a
harmless extension (like Mozilla's .part). We already have unique_name()
in utils.c, altough it returns the file name untouched if it does not
exist. We should do some research on whether we could reuse parts of it.
3. Place them in /tmp, or even better, in ~/.wget-tempfiles, or
something like that.

There's a patch by Tim somewhere in this list that already does 1 (but
please, remove the braces ;D).

It also comes to my mind, instead of writing each temp file to its own
file, we could put them all in the same file (with O_APPEND). But a) we
need a way to tell them apart later, and b) it may cause problems in
NFS, according to open(2).

[1] http://cr.yp.to/qmail/guarantee.html

On 15/08/16 18:31, Tim Rühsen wrote:
> On Montag, 15. August 2016 10:02:55 CEST moparisthebest wrote:
>> Hello,
>>
>> I find it extremely hard to call this a wget vulnerability when SO many
>> other things are wrong with that 'vulnerable code' implementation it
>> isn't even funny:
>>
>> 1. The image_importer.php script takes a single argument, why would it
>> download with the recursive switch turned on?  Isn't that clearly a bug
>> in the php script?  Has a php script like this that downloads all files
>> from a website of a particular extension ever been observed in the wild?
>>
>> 2. A *well* configured server would have a whitelist of .php files it
>> will execute, making it immune to this.  A *decently* configured server
>> would always at a minimum make sure they don't execute code in
>> directories with user provided uploads in them.  So it's additionally a
>> bug in the server configuration. (incidentally every php package I've
>> downloaded has at minimum a .htaccess in upload directories to prevent
>> this kind of thing with apache)
>>
>> It seems to me like there has always been plenty of ways to shoot
>> yourself in the foot with PHP, and this is just another iteration on a
>> theme.
> 
> Hi,
> 
> this is absolutely true and your points were the first things that came to my 
> mind when reading the original post.
> 
> But there is also non-obvious wget behavior in creating those (temp) files in 
> the filesystem. And there is also a long history of attack vectors introduced 
> by temp files as well.
> 
> Today the maintainers discussed a few possible fixes, all with pros and cons.
> I would like to list them here, in case someone likes to comment:
> 
> 1. Rewrite code to keep temp files in memory.
> Too complex, needs a redesign of wget. And has been done for wget2...
> 
> 2. Add a harmless extension to the file names.
> Possible name collision with wanted files.
> Possible name length issues, have to be worked around.
> 
> 3. Using file mode 0 (no flags at all).
> Short vulnerability when changing modes to write/read the data.
> 
> 4. Using O_TMPFILE for open().
> Just for Linux, not for every filesystem available.
> 
> 5. Using mkostemp().
> Possible name collision with wanted files (which would be unexpectedly named 
> as 
> *.1 in case of a collision). At least the chance for a collision seems very 
> low.
> 
> Any thoughts or other ideas ?
> 
> Regards, Tim
> 

signature.asc
Description: OpenPGP digital signature

Re: [Bug-wget] What ought to be a simple use of wget

2016-08-02 Thread Ander Juaristi

Hi Dale,

I'm seeing it always redirects to www.iana.org/protocols

Would -A protocols work for you?

e.g
wget mirror --convert-links --no-parent --page-requisites -A
protocols http://www.iana.org/protocols

On 02/08/16 18:38, Dale R. Worley wrote:
> I want to make a local copy of the "IANA protocol assignments" web
> pages.  It seems to me that this ought to be a simple use of wget in
> recursive mode, and indeed, it seems like someone else must have run
> into this need before.  But I can't get a combination of wget options
> that has the behavior I want.
> 
> The goal is to make a local file tree that mirrors these URLs:
> 
> http://www.iana.org/assignments/index.html
> (That page should be in a file named 'index.html'.)
> 
> every HTML page under http://www.iana.org/assignments/ that can be
> reached from index.html
> 
> page requisites for those pages, even if they aren't under
> http://www.iana.org/assignments/
> 
> The interference comes from all the stuff under http://www.iana.org that
> is not under http://www.iana.org/assignments, but which is pointed to by
> the pages listed above.
> 
> To resolve the simple problem, it appears that --page-requisites does
> fetch the page requisites, even if they aren't under
> http://www.iana.org/assignments/.  So that part of the solution works
> fine.
> 
> But I can't figure out the right combination of options to fetch the
> HTML files that I want:
> 
> 
> wget --mirror --convert-links --no-parent --page-requisites 
> http://www.iana.org/assignments/index.html
> Follows links outside of /assignments/.
> 
> wget --mirror --convert-links --exclude-directories=/ --page-requisites 
> http://www.iana.org/assignments/index.html
> This doesn't recurse beyond index.html.
> 
> wget --mirror --convert-links --no-parent --page-requisites 
> http://www.iana.org/assignments
> Follows links outside of /assignments/.
> 
> wget --mirror --convert-links --exclude-directories=/ --page-requisites 
> http://www.iana.org/assignments
> This doesn't recurse beyond index.html.
> 
> wget --mirror --convert-links --no-parent --page-requisites 
> http://www.iana.org/assignments/
> This doesn't recurse beyond index.html.
> 
> wget --mirror --convert-links --exclude-directories=/ --page-requisites 
> http://www.iana.org/assignments/
> This doesn't recurse beyond index.html.
> 
> 
> I'm hoping that this is a known problem and someone can tell me the
> answer without having to think about it.
> 
> I also think the documentation could be made clearer in some places, but
> that can wait.
> 
> Dale
> 



signature.asc
Description: OpenPGP digital signature

[Bug-wget] [bug #48634] MAC OS : Building with "--with-ssl=openssl" error

2016-07-30 Thread Ander Juaristi

Follow-up Comment #4, bug #48634 (project wget):

> Also won't it prevent wget from using the ssl protocol? 

Doesn't invoking ./configure without params default to GNUTLS? What's the
status of GNUTLS on Mac OS X?

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/

Re: [Bug-wget] [PATCH] Keep fetched URLs in POSIX extended attributes

2016-07-22 Thread Ander Juaristi

Also, shouldn't we print something when xattrs are not supported? In
Linux fsetxattr() returns -1 when that is the case.


Extended attributes are not supported by all filesystems, eg:

dd if=/dev/zero bs=64M count=1 of=./minix.img
mkfs.minix ./minix.img
mount -o loop -t minix minix.img /tmp/minix-test

And then:
wget --xattr -O /tmp/minix-test/archive.html http://archive.org
getfattr -d /tmp/minix-test/archive.html


Yeah, Minix is not used in the real world, but NTFS is (sometimes in the
form of portable USB drives) and I think it also does not support
extended attributes. Didn't try it - more hassle to set up in 5 minutes.

If we trust Wikipedia [1], other filesystems from the MS world (such as
FAT) do not support xattrs either.


[1] https://en.wikipedia.org/wiki/Extended_file_attributes#Linux

On 21/07/16 06:33, Sean Burford wrote:
> Hi,
> 
> I find it useful to keep track of where files are downloaded from.  POSIX
> extended attributes provide a lightweight portable method of keeping this
> information across Linux, OS/X, FreeBSD and many other platforms.
> 
> This compliments wget's existing WARC support, which serves a related but
> different use case closer to tcpdump or tar for web pages.  Extended
> attributes can provide a quick answer to "where did I get this file from
> again?"
> 
> This patch changes:
> *   autoconf detects whether extended attributes are available and enables
> the code if they are.
> *   The new flags --xattr and --no-xattr control whether xattr is enabled.
> *   The new command "xattr = (on|off)" can be used in ~/.wgetrc or
> /etc/wgetrc
> *   The original and redirected URLs are recorded as shown below.
> *   This works for both single fetches and recursive mode.
> 
> Here is an example, where http://archive.org redirects to
> https://archive.org:
> $ wget --xattr http://archive.org
> ...
> $ getfattr -d index.html
> user.xdg.origin.url="https://archive.org/";
> user.xdg.referrer.url="http://archive.org/";
> 
> These attributes were chosen based on those stored by Google Chrome (
> https://bugs.chromium.org/p/chromium/issues/detail?id=45903) and curl (
> https://github.com/curl/curl/blob/master/src/tool_xattr.c)
> 



signature.asc
Description: OpenPGP digital signature

Re: [Bug-wget] [PATCH] Keep fetched URLs in POSIX extended attributes

2016-07-22 Thread Ander Juaristi

Hi Sean,

Great patch!

But I just had some unimportant comments about coding style ;D

On 21/07/16 06:33, Sean Burford wrote:
> diff --git a/src/ftp.c b/src/ftp.c
> index 88a9777..27d90d6 100644
> --- a/src/ftp.c
> +++ b/src/ftp.c
> @@ -52,6 +52,9 @@ as that of the covered work.  */
>  #include "recur.h"  /* for INFINITE_RECURSION */
>  #include "warc.h"
>  #include "c-strcase.h"
> +#ifdef ENABLE_XATTR
> +#include "xattr.h"
> +#endif
>  
>  #ifdef __VMS
>  # include "vms.h"
> @@ -1546,6 +1549,13 @@ Error in server response, closing control 
> connection.\n"));
>tmrate = retr_rate (rd_size, con->dltime);
>total_download_time += con->dltime;
>  
> +#ifdef ENABLE_XATTR
> +  if (opt.enable_xattr)
> +{
> +  set_file_metadata (u->url, NULL, fp);
> +}
> +#endif
> +

remove the braces, and indent the if body by two spaces

>fd_close (local_sock);
>/* Close the local file.  */
>if (!output_stream || con->cmd & DO_LIST)
> diff --git a/src/http.c b/src/http.c
> index 7e60a07..0cd142c 100644
> --- a/src/http.c
> +++ b/src/http.c
> @@ -66,6 +66,9 @@ as that of the covered work.  */
>  # include "metalink.h"
>  # include "xstrndup.h"
>  #endif
> +#ifdef ENABLE_XATTR
> +#include "xattr.h"
> +#endif
>  
>  #ifdef TESTING
>  #include "test.h"
> @@ -2892,8 +2895,8 @@ fail:
> If PROXY is non-NULL, the connection will be made to the proxy
> server, and u->url will be requested.  */
>  static uerr_t
> -gethttp (struct url *u, struct http_stat *hs, int *dt, struct url *proxy,
> - struct iri *iri, int count)
> +gethttp (struct url *u, struct url *original_url, struct http_stat *hs,
> + int *dt, struct url *proxy, struct iri *iri, int count)
>  {
>struct request *req = NULL;
>  
> @@ -3754,6 +3757,20 @@ gethttp (struct url *u, struct http_stat *hs, int *dt, 
> struct url *proxy,
>goto cleanup;
>  }
>  
> +#ifdef ENABLE_XATTR
> +  if (opt.enable_xattr)
> +{
> +  if (original_url != u)
> +{
> +  set_file_metadata (u->url, original_url->url, fp);
> +}
> +  else
> +{
> +  set_file_metadata (u->url, NULL, fp);
> +}
> +}
> +#endif
> +

likewise, for the inner if-else

>err = read_response_body (hs, sock, fp, contlen, contrange,
>  chunked_transfer_encoding,
>  u->url, warc_timestamp_str,
> @@ -3972,7 +3989,7 @@ http_loop (struct url *u, struct url *original_url, 
> char **newloc,
>  *dt &= ~SEND_NOCACHE;
>  
>/* Try fetching the document, or at least its head.  */
> -  err = gethttp (u, &hstat, dt, proxy, iri, count);
> +  err = gethttp (u, original_url, &hstat, dt, proxy, iri, count);
>  
>/* Time?  */
>tms = datetime_str (time (NULL));

And here,

> diff --git a/src/xattr.c b/src/xattr.c
> new file mode 100644
> index 000..360b032
> --- /dev/null
> +++ b/src/xattr.c
> @@ -0,0 +1,71 @@
> +/* xattr.h -- POSIX Extended Attribute support.
> +
> +   Copyright (C) 2016 Free Software Foundation, Inc.
> +
> +   This program is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   This program is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with this program; if not, see .  */
> +
> +#include "wget.h"
> +
> +#include 
> +#include 
> +
> +#include "log.h"
> +#include "xattr.h"
> +
> +#ifdef USE_XATTR
> +
> +static int
> +write_xattr_metadata (const char *name, const char *value, FILE *fp) {
> +  int retval = -1;
> +  if (name && value && fp)
> +{
> +  retval = fsetxattr (fileno(fp), name, value, strlen(value), 0);

we use spaces before the parentheses:
fileno (fp) and strlen (value)

> +  /* FreeBSD's extattr_set_fd returns the length of the extended 
> attribute. */
> +  retval = (retval < 0)? retval : 0;

and before the '?'

> +}
> +  return retval;
> +}
> +
> +#else /* USE_XATTR */
> +




signature.asc
Description: OpenPGP digital signature

Re: [Bug-wget] [PATCH] Trivial changes in HSTS

2016-06-26 Thread Ander Juaristi

Hi all,

On 18/06/16 10:54, Gisle Vanem wrote:
> Eli Zaretskii wrote:
> 
>> IMO, this test should be bypassed on Windows.  The "world" part in
>> "world-writeable" is a Unix-centric notion, and its translation into
>> MS-Windows ACLs is non-trivial (read: "impossible").  (For example,
>> your "non-world-writeable" file is accessible to certain users and
>> groups of users on Windows, other than Administrator.)  So the sanest
>> solution for this is simply not to make this test on Windows.
> 
> Makes sense. I agree.
> 

Patch attached.

We still check whether the file exists.

Best regards,
- AJ
From 6c8abe30eb39ad4313a851f9b46457249cf5e726 Mon Sep 17 00:00:00 2001
From: Ander Juaristi 
Date: Sun, 26 Jun 2016 17:43:28 +0200
Subject: [PATCH] Bypass world-writable checks on Windows

 * src/hsts.c (hsts_file_access_valid): we should check for "world-writable"
   files only on Unix-based systems. It's difficult to mimic the same behavior
   on Windows, so it's better to just not do it.

Reported-by: Gisle Vanem 
Reported-by: Eli Zaretskii 
---
 src/hsts.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/hsts.c b/src/hsts.c
index 4d748ac..a0087a6 100644
--- a/src/hsts.c
+++ b/src/hsts.c
@@ -348,7 +348,15 @@ hsts_file_access_valid (const char *filename)
   if (stat (filename, &st) == -1)
 return false;
 
-  return !(st.st_mode & S_IWOTH) && S_ISREG (st.st_mode);
+  return
+#ifndef WINDOWS
+  /*
+   * The world-writable concept is a Unix-centric notion.
+   * We bypass this test on Windows.
+   */
+  !(st.st_mode & S_IWOTH) &&
+#endif
+  S_ISREG (st.st_mode);
 }
 
 /* HSTS API */
-- 
2.1.4



signature.asc
Description: OpenPGP digital signature

Re: [Bug-wget] 'Reading HSTS entries from' bug

2016-06-22 Thread Ander Juaristi

Hi,

On 20/06/16 16:12, . wrote:
> it was not fixed completely for v1.18

I cannot reproduce it after the patch was merged upstream. The 'saving
HSTS entries' text only appears when the HSTS file is updated.

I do see, however, that it still says 'Reading HSTS entries from...' in
the beginning, even for non-HSTS hosts, and FTP. Is this what you mean?

The HSTS entries are read in the beginning, long before the URI is
actually parsed. The only solution I can come across is to check if the
scheme is FTP or FTPS, and not load the HSTS at all if affirmative. I
forward to the list for discussion because I don't know if it's worth
the effort.

Best regards,
- AJ


0xD2438B33.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature

[Bug-wget] [bug #48232] Sometimes wget restarts download from the beginning, even if the server supports resumed downloads

2016-06-16 Thread Ander Juaristi

Follow-up Comment #1, bug #48232 (project wget):

See the attached patch. It fixes the issue on my side.

I'm specially afraid of having introduced an off-by-one error, so please
remind me to write a test case for this next week. Just no time now.

(file #37503)
___

Additional Item Attachment:

File name: 0001-Do-not-overwrite-restval-if-len-is-smaller.patch Size:1 KB


___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/

Re: [Bug-wget] 'Saving HSTS entries to' bug

2016-05-25 Thread Ander Juaristi

Hi Tim,

On 24/05/16 14:49, Tim Ruehsen wrote:
> 
> I meant e.g. in hsts_match(), you eventually remove an expired entry, but you 
> do not set store->changed. Thus on exit, the database wouldn't be updated.
> I think it is good to keep the database as small as possible - following 
> invocations of wget have less to read.
> 

True. I could not reproduce the behavior you describe, but I see the
'changed' field is not being updated in hsts_match(). I knew I was
forgetting something!

Please find a new patch attached, and tell me if there's still something
left.

> 
> Sounds good, thanks for explaining.
> My thought was to update the DB every time we get a STS header, since max_age 
> is relative to the current time. But your approach is much more elegant in 
> respect to the number of writes (disk&cpu usage).

Actually, I blindly followed the RFC.

But yes, I guess that was one of the reasons the RFC authors did it that
way.

> 
> Tim
> 

Best regards,

- AJ
>From 931db7e9fd0a55225e3e2d75fb3fdf328cb565bc Mon Sep 17 00:00:00 2001
From: Ander Juaristi 
Date: Tue, 24 May 2016 11:14:38 +0200
Subject: [PATCH] Correct HSTS debug message

 * src/main.c (save_hsts): save the in-memory HSTS database to a file
   only if something changed.
 * src/hsts.c (struct hsts_store): new field 'changed'.
   (hsts_match): update field 'changed' accordingly.
   (hsts_store_entry): update field 'changed' accordingly.
   (hsts_store_has_changed): new function.
 * src/hsts.h (hsts_store_has_changed): new function.
---
 src/hsts.c | 23 +--
 src/hsts.h |  1 +
 src/main.c |  8 +---
 3 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/src/hsts.c b/src/hsts.c
index d5e0bee..be824ea 100644
--- a/src/hsts.c
+++ b/src/hsts.c
@@ -53,6 +53,7 @@ as that of the covered work.  */
 struct hsts_store {
   struct hash_table *table;
   time_t last_mtime;
+  bool changed;
 };
 
 struct hsts_kh {
@@ -370,10 +371,14 @@ hsts_match (hsts_store_t store, struct url *u)
   if (u->port == 80)
 u->port = 443;
   url_changed = true;
+  store->changed = true;
 }
 }
   else
-hsts_remove_entry (store, kh);
+{
+  hsts_remove_entry (store, kh);
+  store->changed = true;
+}
 }
   xfree (kh->host);
 }
@@ -423,7 +428,10 @@ hsts_store_entry (hsts_store_t store,
   if (entry && match == CONGRUENT_MATCH)
 {
   if (max_age == 0)
-hsts_remove_entry (store, kh);
+{
+  hsts_remove_entry (store, kh);
+  store->changed = true;
+}
   else if (max_age > 0)
 {
   if (entry->max_age != max_age ||
@@ -436,6 +444,8 @@ hsts_store_entry (hsts_store_t store,
 entry->created = t;
   entry->max_age = max_age;
   entry->include_subdomains = include_subdomains;
+
+  store->changed = true;
 }
 }
   /* we ignore negative max_ages */
@@ -450,6 +460,8 @@ hsts_store_entry (hsts_store_t store,
  happen we got a non-existent entry with max_age == 0.
   */
   result = hsts_add_entry (store, host, port, max_age, include_subdomains);
+  if (result)
+store->changed = true;
 }
   /* we ignore new entries with max_age == 0 */
   xfree (kh->host);
@@ -470,6 +482,7 @@ hsts_store_open (const char *filename)
   store = xnew0 (struct hsts_store);
   store->table = hash_table_new (0, hsts_hash_func, hsts_cmp_func);
   store->last_mtime = 0;
+  store->changed = false;
 
   if (file_exists_p (filename))
 {
@@ -531,6 +544,12 @@ hsts_store_save (hsts_store_t store, const char *filename)
 }
 }
 
+bool
+hsts_store_has_changed (hsts_store_t store)
+{
+  return (store ? store->changed : false);
+}
+
 void
 hsts_store_close (hsts_store_t store)
 {
diff --git a/src/hsts.h b/src/hsts.h
index 9a7043d..6e5bbc8 100644
--- a/src/hsts.h
+++ b/src/hsts.h
@@ -43,6 +43,7 @@ hsts_store_t hsts_store_open (const char *);
 
 void hsts_store_save (hsts_store_t, const char *);
 void hsts_store_close (hsts_store_t);
+bool hsts_store_has_changed (hsts_store_t);
 
 bool hsts_store_entry (hsts_store_t,
enum url_scheme, const char *, int,
diff --git a/src/main.c b/src/main.c
index ed050a5..e7d5c66 100644
--- a/src/main.c
+++ b/src/main.c
@@ -204,10 +204,12 @@ save_hsts (void)
 {
   char *filename = get_hsts_database ();
 
-  if (filename)
-DEBUGP (("Saving HSTS entries to %s\n", filename));
+  if (filename && hsts_store_has_changed (hsts_store))
+{
+  DEBUGP (("Saving HSTS entries to %s\n&

Re: [Bug-wget] [PATCH] Trivial changes in HSTS

2016-05-25 Thread Ander Juaristi

Hi,

On 24/05/16 21:10, Tim Rühsen wrote:
> 
> Hi Ander,
> 
> could you rearrange the code in hsts_store_open() a bit to
> - avoid double calling file_exists_p (filename)
> - reduce the scope of 'st' and 'fp'
> 
> if (file_exists_p (filename)) {
>   if (hsts_file_access_valid (filename)) {
> struct_stat st;
> FILE *fp = fopen (filename, "r");
> ...
>   } else {
> ...
>   }
> 
> out:
>   return store;
> }

How about this?

> 
> Regards, Tim
> 

Regards,

- AJ
>From 94bcb5d1c3928b76863783c7f4daee215870a959 Mon Sep 17 00:00:00 2001
From: Ander Juaristi 
Date: Wed, 6 Apr 2016 12:55:17 +0200
Subject: [PATCH] Check the HSTS file is not world-writable

 * hsts.c (hsts_file_access_valid): check that the file is a regular
   file, and that it's not world-writable.
   (hsts_store_open): if the HSTS database file does not meet the
   above requirements, disable HSTS at all.
---
 src/hsts.c | 54 ++
 1 file changed, 42 insertions(+), 12 deletions(-)

diff --git a/src/hsts.c b/src/hsts.c
index d5e0bee..4be7f0d 100644
--- a/src/hsts.c
+++ b/src/hsts.c
@@ -334,6 +334,22 @@ hsts_store_dump (hsts_store_t store, FILE *fp)
 }
 }
 
+/*
+ * Test:
+ *  - The file is a regular file (ie. not a symlink), and
+ *  - The file is not world-writable.
+ */
+static bool
+hsts_file_access_valid (const char *filename)
+{
+  struct_stat st;
+
+  if (stat (filename, &st) == -1)
+return false;
+
+  return !(st.st_mode & S_IWOTH) && S_ISREG (st.st_mode);
+}
+
 /* HSTS API */
 
 /*
@@ -464,8 +480,6 @@ hsts_store_t
 hsts_store_open (const char *filename)
 {
   hsts_store_t store = NULL;
-  struct_stat st;
-  FILE *fp = NULL;
 
   store = xnew0 (struct hsts_store);
   store->table = hash_table_new (0, hsts_hash_func, hsts_cmp_func);
@@ -473,24 +487,40 @@ hsts_store_open (const char *filename)
 
   if (file_exists_p (filename))
 {
-  fp = fopen (filename, "r");
+  if (hsts_file_access_valid (filename))
+{
+  struct_stat st;
+  FILE *fp = fopen (filename, "r");
 
-  if (!fp || !hsts_read_database (store, fp, false))
+  if (!fp || !hsts_read_database (store, fp, false))
+{
+  /* abort! */
+  hsts_store_close (store);
+  xfree (store);
+  fclose (fp);
+  goto out;
+}
+
+  if (fstat (fileno (fp), &st) == 0)
+store->last_mtime = st.st_mtime;
+
+  fclose (fp);
+}
+  else
 {
-  /* abort! */
+  /*
+   * If we're not reading the HSTS database,
+   * then by all means act as if HSTS was disabled.
+   */
   hsts_store_close (store);
   xfree (store);
-  goto out;
-}
 
-  if (fstat (fileno (fp), &st) == 0)
-store->last_mtime = st.st_mtime;
+  logprintf (LOG_NOTQUIET, "Will not apply HSTS. "
+ "The HSTS database must be a regular and non-world-writable file.\n");
+}
 }
 
 out:
-  if (fp)
-fclose (fp);
-
   return store;
 }
 
-- 
2.1.4

Re: [Bug-wget] [PATCH] Trivial changes in HSTS

2016-05-24 Thread Ander Juaristi

Hi,

On 11/04/16 16:51, Tim Ruehsen wrote:
> 
> Did you consider Giuseppe's suggestion ?
> "can the file_exists_p check just be moved to hsts_file_access_valid that
> doesn't return an error on ENOENT?  In other words, just have here:
> if (hsts_file_access_valid (filename))"

New patch attached.

> 
> Tim
> 

Regards,

- AJ
>From 7f6db4a128e4f9ecd99a493d50b0fb58c1fc8aa6 Mon Sep 17 00:00:00 2001
From: Ander Juaristi 
Date: Wed, 6 Apr 2016 12:55:17 +0200
Subject: [PATCH] Check the HSTS file is not world-writable

 * hsts.c (hsts_file_access_valid): check that the file is a regular
   file, and that it's not world-writable.
   (hsts_store_open): if the HSTS database file does not meet the
   above requirements, disable HSTS at all.
---
 src/hsts.c | 30 +-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/src/hsts.c b/src/hsts.c
index d5e0bee..b29f352 100644
--- a/src/hsts.c
+++ b/src/hsts.c
@@ -334,6 +334,22 @@ hsts_store_dump (hsts_store_t store, FILE *fp)
 }
 }
 
+/*
+ * Test:
+ *  - The file is a regular file (ie. not a symlink), and
+ *  - The file is not world-writable.
+ */
+static bool
+hsts_file_access_valid (const char *filename)
+{
+  struct_stat st;
+
+  if (stat (filename, &st) == -1)
+return false;
+
+  return !(st.st_mode & S_IWOTH) && S_ISREG (st.st_mode);
+}
+
 /* HSTS API */
 
 /*
@@ -471,7 +487,7 @@ hsts_store_open (const char *filename)
   store->table = hash_table_new (0, hsts_hash_func, hsts_cmp_func);
   store->last_mtime = 0;
 
-  if (file_exists_p (filename))
+  if (hsts_file_access_valid (filename))
 {
   fp = fopen (filename, "r");
 
@@ -486,6 +502,18 @@ hsts_store_open (const char *filename)
   if (fstat (fileno (fp), &st) == 0)
 store->last_mtime = st.st_mtime;
 }
+  else if (file_exists_p (filename))
+{
+  /*
+   * If we're not reading the HSTS database,
+   * then by all means act as if HSTS was disabled.
+   */
+  hsts_store_close (store);
+  xfree (store);
+
+  logprintf (LOG_NOTQUIET, "Will not apply HSTS. "
+ "The HSTS database must be a regular and non-world-writable file.\n");
+}
 
 out:
   if (fp)
-- 
2.1.4

Re: [Bug-wget] 'Saving HSTS entries to' bug

2016-05-24 Thread Ander Juaristi

Hi Tim,

On 24/05/16 13:15, Tim Ruehsen wrote:
> Hi Ander,
> 
> after applying your patch I still see changes in store->table (resp. changes 
> of the contents of the entries) without tagging store as changed.
> 
> Everywhere you change something that gets dumped to the disk database *must* 
> set the store->changed flag.

We only write to the file once - at the end. So that is essentially the
same as setting the changed flag when something changes in memory.

But I suspect I'm not following what you say...

> 
> For example:
> Executing
>   wget -d www.yahoo.com
> twice shows updating the HSTS database only for the first time (taking a nap 
> of two seconds between) - the max_age should be updated in the database for 
> both invocations.

That behavior is correct.

The value of max-age does not change - it remains the same every time
you send a request. We don't update the file if the values reported
(max-age, includeSubdomains, etc.) haven't changed since the last time
we stored them.

This is the workflow: http://www.yahoo.com --> https://www.yahoo.com -->
https://es.yahoo.com/?p=us

And finally, it says:

Strict-Transport-Security: max-age=2592000

And it is always the same.

Thus, we store it the first time as:

es.yahoo.com0   0   1464090336  2592000

And don't do anything else unless the Yahoo server sends different
values. Initially, we only checked whether max-age changed. Now, we also
check includeSubdomains, since my recent commit 2f1c6a0.

> 
> Tim
>

Re: [Bug-wget] 'Saving HSTS entries to' bug

2016-05-24 Thread Ander Juaristi

Yes, I guess you're right.

Have a look at this.

On 24/05/16 11:48, Tim Ruehsen wrote:
>> But
>> again, I don't think it pays off for the sole reason of removing a debug
>> message.
> 
> Sorry, I forgot to mention, that it is about writing the database when it is 
> not needed. Just think of someone having a larg HSTS database in combination 
> with lot's of short-live invocations of wget.
> 
> Tim
> 
From c2f2c4343e635a6b8c0f10daa94d5cefde59dfed Mon Sep 17 00:00:00 2001
From: Ander Juaristi 
Date: Tue, 24 May 2016 11:14:38 +0200
Subject: [PATCH] Correct HSTS debug message

 * src/main.c (save_hsts): save the in-memory HSTS database to a file
   only if something changed.
 * src/hsts.c (struct hsts_store): new field 'changed'.
   (hsts_store_entry): update field 'changed' accordingly.
   (hsts_store_has_changed): new function.
 * src/hsts.h (hsts_store_has_changed): new function.
---
 src/hsts.c | 17 -
 src/hsts.h |  1 +
 src/main.c |  8 +---
 3 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/src/hsts.c b/src/hsts.c
index d5e0bee..1337a2a 100644
--- a/src/hsts.c
+++ b/src/hsts.c
@@ -53,6 +53,7 @@ as that of the covered work.  */
 struct hsts_store {
   struct hash_table *table;
   time_t last_mtime;
+  bool changed;
 };
 
 struct hsts_kh {
@@ -423,7 +424,10 @@ hsts_store_entry (hsts_store_t store,
   if (entry && match == CONGRUENT_MATCH)
 {
   if (max_age == 0)
-hsts_remove_entry (store, kh);
+{
+  hsts_remove_entry (store, kh);
+  store->changed = true;
+}
   else if (max_age > 0)
 {
   if (entry->max_age != max_age ||
@@ -436,6 +440,8 @@ hsts_store_entry (hsts_store_t store,
 entry->created = t;
   entry->max_age = max_age;
   entry->include_subdomains = include_subdomains;
+
+  store->changed = true;
 }
 }
   /* we ignore negative max_ages */
@@ -450,6 +456,8 @@ hsts_store_entry (hsts_store_t store,
  happen we got a non-existent entry with max_age == 0.
   */
   result = hsts_add_entry (store, host, port, max_age, include_subdomains);
+  if (result)
+store->changed = true;
 }
   /* we ignore new entries with max_age == 0 */
   xfree (kh->host);
@@ -470,6 +478,7 @@ hsts_store_open (const char *filename)
   store = xnew0 (struct hsts_store);
   store->table = hash_table_new (0, hsts_hash_func, hsts_cmp_func);
   store->last_mtime = 0;
+  store->changed = false;
 
   if (file_exists_p (filename))
 {
@@ -531,6 +540,12 @@ hsts_store_save (hsts_store_t store, const char *filename)
 }
 }
 
+bool
+hsts_store_has_changed (hsts_store_t store)
+{
+  return (store ? store->changed : false);
+}
+
 void
 hsts_store_close (hsts_store_t store)
 {
diff --git a/src/hsts.h b/src/hsts.h
index 9a7043d..6e5bbc8 100644
--- a/src/hsts.h
+++ b/src/hsts.h
@@ -43,6 +43,7 @@ hsts_store_t hsts_store_open (const char *);
 
 void hsts_store_save (hsts_store_t, const char *);
 void hsts_store_close (hsts_store_t);
+bool hsts_store_has_changed (hsts_store_t);
 
 bool hsts_store_entry (hsts_store_t,
enum url_scheme, const char *, int,
diff --git a/src/main.c b/src/main.c
index ed050a5..e7d5c66 100644
--- a/src/main.c
+++ b/src/main.c
@@ -204,10 +204,12 @@ save_hsts (void)
 {
   char *filename = get_hsts_database ();
 
-  if (filename)
-DEBUGP (("Saving HSTS entries to %s\n", filename));
+  if (filename && hsts_store_has_changed (hsts_store))
+{
+  DEBUGP (("Saving HSTS entries to %s\n", filename));
+  hsts_store_save (hsts_store, filename);
+}
 
-  hsts_store_save (hsts_store, filename);
   hsts_store_close (hsts_store);
 
   xfree (filename);
-- 
2.1.4

Re: [Bug-wget] 'Saving HSTS entries to' bug

2016-05-24 Thread Ander Juaristi

Sorry, in main.c it should have been

if (filename && hsts_store->changed)

On 24/05/16 11:23, Ander Juaristi wrote:
> Hi Tim,
> 
> On 24/05/16 10:24, Tim Ruehsen wrote:
>> Hi Ander,
>>
>> IMO, another possibility is to add a flag to 'struct hsts_store' that 
>> indicates any change made. hsts_store_save() could be skipped if that flag 
>> is 
>> not set.
>>
>> At the same time the debug info has to be moved from main.c/save_hsts() to 
>> hsts.c/hsts_store_save() OR hsts.c needs another function to return the 
>> value 
>> of the flag, so that save_hsts() could check it.
>>
> 
> You mean something like the attached?
> 
>> WDYT ?
> 
> That could be an option. But I'm not sure we can skip the call to
> hsts_store_save at all, because doing so would also mean not updating
> the in-memory HSTS database, which might have been modified by other
> wget processes.
> 
> We could also test whether there have been such changes (within
> hsts_store_save), and then call hsts_store_dump only if affirmative. But
> again, I don't think it pays off for the sole reason of removing a debug
> message.
> 
>>
>> Tim
>>

Re: [Bug-wget] 'Saving HSTS entries to' bug

2016-05-24 Thread Ander Juaristi

Hi Tim,

On 24/05/16 10:24, Tim Ruehsen wrote:
> Hi Ander,
> 
> IMO, another possibility is to add a flag to 'struct hsts_store' that 
> indicates any change made. hsts_store_save() could be skipped if that flag is 
> not set.
> 
> At the same time the debug info has to be moved from main.c/save_hsts() to 
> hsts.c/hsts_store_save() OR hsts.c needs another function to return the value 
> of the flag, so that save_hsts() could check it.
> 

You mean something like the attached?

> WDYT ?

That could be an option. But I'm not sure we can skip the call to
hsts_store_save at all, because doing so would also mean not updating
the in-memory HSTS database, which might have been modified by other
wget processes.

We could also test whether there have been such changes (within
hsts_store_save), and then call hsts_store_dump only if affirmative. But
again, I don't think it pays off for the sole reason of removing a debug
message.

> 
> Tim
> 
From 5673896b0eca6a0707f8c0971002e7571f0dd9ba Mon Sep 17 00:00:00 2001
From: Ander Juaristi 
Date: Tue, 24 May 2016 11:14:38 +0200
Subject: [PATCH] Correct HSTS debug message

---
 src/hsts.c | 11 ++-
 src/main.c |  2 +-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/src/hsts.c b/src/hsts.c
index d5e0bee..c04ee73 100644
--- a/src/hsts.c
+++ b/src/hsts.c
@@ -53,6 +53,7 @@ as that of the covered work.  */
 struct hsts_store {
   struct hash_table *table;
   time_t last_mtime;
+  bool changed;
 };
 
 struct hsts_kh {
@@ -423,7 +424,10 @@ hsts_store_entry (hsts_store_t store,
   if (entry && match == CONGRUENT_MATCH)
 {
   if (max_age == 0)
-hsts_remove_entry (store, kh);
+{
+  hsts_remove_entry (store, kh);
+  store->changed = true;
+}
   else if (max_age > 0)
 {
   if (entry->max_age != max_age ||
@@ -436,6 +440,8 @@ hsts_store_entry (hsts_store_t store,
 entry->created = t;
   entry->max_age = max_age;
   entry->include_subdomains = include_subdomains;
+
+  store->changed = true;
 }
 }
   /* we ignore negative max_ages */
@@ -450,6 +456,8 @@ hsts_store_entry (hsts_store_t store,
  happen we got a non-existent entry with max_age == 0.
   */
   result = hsts_add_entry (store, host, port, max_age, include_subdomains);
+  if (result)
+store->changed = true;
 }
   /* we ignore new entries with max_age == 0 */
   xfree (kh->host);
@@ -470,6 +478,7 @@ hsts_store_open (const char *filename)
   store = xnew0 (struct hsts_store);
   store->table = hash_table_new (0, hsts_hash_func, hsts_cmp_func);
   store->last_mtime = 0;
+  store->changed = false;
 
   if (file_exists_p (filename))
 {
diff --git a/src/main.c b/src/main.c
index ed050a5..4189414 100644
--- a/src/main.c
+++ b/src/main.c
@@ -204,7 +204,7 @@ save_hsts (void)
 {
   char *filename = get_hsts_database ();
 
-  if (filename)
+  if (hsts_store->changed)
 DEBUGP (("Saving HSTS entries to %s\n", filename));
 
   hsts_store_save (hsts_store, filename);
-- 
2.1.4

Re: [Bug-wget] 'Saving HSTS entries to' bug

2016-05-23 Thread Ander Juaristi

Hi,

I would leave it unchanged. For me this is a WONTFIX, for the following
reasons:

 1. The message is only printed when debug output (-d) is enabled. That
is disabled by default. Any user who enables it is expected to be wise
enough to know how to interpret the output, or at least treat it with care.
 2. Solving this would require checking whether the scheme is 'ftp://'
and in the case of HTTP(S), further checking whether the
Strict-Transport-Security header was set (in the case of HTTPS), or we
were redirected to the HTTPS entry point of the site and that entry
point sets it. This adds extra unnecessary complexity for the single
reason of hiding an output that only appears in debug mode. IMO it does
not pay off.
 3. The HSTS file is read at the beginning, and written at the end. That
is the best way of doing it, and the way other UAs work. A simpler
solution than that proposed at point 2 would require putting the HSTS
load/save routines in other place, maybe checking them on a per-URL
basis. This also does not pay off IMO.

The best 'fix' that comes to my mind is a compromise. Don't remove the
message (for the reasons mentioned), but print how many HSTS entries
have been read/updated/written. Something like:

Saving HSTS entries to /home/strunk/.wget-hsts (read: 1, updated: 0)

I would do either this or nothing. Tell me if this is acceptable.

On 21/05/16 11:21, vasele...@yahoo.gr wrote:
> this is displayed on every run even with hosts that have no HSTS, and
> with FTP and HTTP hosts.
> 
> the savannah 'bug submit' is broken.
> 
> 

Best regards,

- AJ

Re: [Bug-wget] 'Saving HSTS entries to' bug

2016-05-21 Thread Ander Juaristi

Hi,

This seems like a trivial problem. I'm getting back to you later.

Thanks.

On 21/05/16 11:21, vasele...@yahoo.gr wrote:
> this is displayed on every run even with hosts that have no HSTS, and
> with FTP and HTTP hosts.
> 
> the savannah 'bug submit' is broken.
> 
> 


0xD2438B33.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature

Re: [Bug-wget] [PATCH] Trivial changes in HSTS

2016-04-08 Thread Ander Juaristi


Hi Tim,

You're right, but the downside is that it could cause a denial of 
service. I was being driven by section 14.5 of RFC 6797. If another user 
modifies your HSTS file and puts a server that does not have HTTPS 
enabled, then you won't be able to contact that server, because wget 
will attempt to do it via HTTPS all the time.


WDYT?

On 07/04/16 12:52, Tim Ruehsen wrote:

On Wednesday 06 April 2016 14:31:17 Juaristi Álamos, Ander wrote:

Hi all,

Here are some patches for HSTS.

  - 0001: checks the HSTS database file is not world-writable, and
refuses to read it if it is, and disables HSTS. This was in my original


Doesn't it make sense to share the HSTS database globally ? It is basically
global data (domain specific) and not user specific.

Thinking forward, a central (trusted) database/daemon for HSTS entries would
be nice - sooner or later almost any domain supports HSTS. Each process
loading/saving a huge file would not be efficient.

Same goes for e.g. cert pinning (but not for cookies which are private data).

Regards, Tim

Re: [Bug-wget] export wget history varible

2016-03-30 Thread Ander Juaristi




On 30/03/16 12:26, Tim Rühsen wrote:

Hi Ander,

very good explanation regarding history.




I am working on a (limited) interactive wget that accepts some commands
from the user and allows autocompletion of links and other goodies, but
that's still work in progress [1], and if some day sees the light, it
will be as part of wget2, the next major release. I'll discuss this with
the maintainers when the time comes, this is not the right thread. I'm
expecting to finish it during summer.



A scriptable wget shell ? With read/write access to all internal structures as
they were part of a database ? With an SQL like query language... (just
dreaming ...).



Well, I wasn't covering all those points, but yes, the high-level idea I 
had in mind resembles your description.



I am very excited to see your work !

Regards, Tim

Re: [Bug-wget] [PATCH] Some fixes for wget.texi

2016-03-29 Thread Ander Juaristi


Hi Tim,

On 24/03/16 22:32, Tim Rühsen wrote:

Am Donnerstag, 24. März 2016, 22:11:59 schrieb Ander Juaristi:

Hi Tim,

On 23/03/16 12:55, Tim Ruehsen wrote:

I fixed a few things (basically URLs) in wget.texi.

There are still a few URLs left over where I wasn't sure. Especially FTP
examples since example.org doesn't have a FTP server.

Please review and comment

Changes overview:
  * wget.texi: Replace server.com by example.com,

replace ftp://wuarchive.wustl.edu by https://example.com,
use HTTPS instead of HTTP where possible,
replace references to addictivecode.org by
www.gnu.org/software/wget,


AFAIK addictivecode.org is maintained by Micah Cowan, who is no longer
taking part in wget. I'm fine with that, as I am with removing all
@addictivecode.org e-mail addresses from the docs.

But IMO wget.addictivecode.org has great documentation about some wget
aspects not found in the official docs[1], such as how to compile
(dependencies required, etc.)  or how to navigate through the source,
which in spite of all the changes the code has suffered over time, is
still kept relevant.

Lots of information there proved to be really useful to me when I
started working on wget, and it still is. I have a couple of pages
bookmarked in my browser. Removing all references to Micah's docs
without first migrating them to the official docs is a real pity for me.

There's still a link in the official site[2], but keeping a link there
and not in the texi looks kind of inconsistent to me.

[1] http://www.gnu.org/software/wget/manual/html_node/index.htmlç
[2] http://www.gnu.org/software/wget/


Hi Ander,

this is a good point, thanks for looking at the patch.

Could you direct me exactly to the parts on addictivecode.org that you think
are important for wget (or are missing in the wget docs) !?


I would keep these:

[1] http://wget.addictivecode.org/NavigatingTheSource
[2] http://wget.addictivecode.org/CompilingRepoSources
[3] http://wget.addictivecode.org/PatchGuidelines
[4] http://wget.addictivecode.org/RepositoryAccess

Although minor modification might be required, all of these are good 
resources, and still relevant except that
- [2] needs to add a few new dependencies (libmetalink comes to my 
mind), and
- [3] should be updated to include git. I've never seen Mercurial around 
since I joined this list.


I think [4] already appears in the home page, so no need to keep it.


I would like to make a list and than get in touch with Micah. Maybe he can
give us a helping hand or at least I would like to ask for his approval to
copy the important parts of his docs.

In the mid-term all the docs / information should be on one site that is
independent of a single person.

Regards, Tim



Regards,
 - AJ

Re: [Bug-wget] export wget history varible

2016-03-29 Thread Ander Juaristi

*_HISTORY environment variables were used by convention by *interactive* 
applications (applications that offer their own CLI with their own 
commands) to establish the path of the history file. These applications 
often used GNU readline to read commands from the user, and the 
management of the history file was also handled by readline itself. 
That's why many interactive applications mimicked the behavior, because 
it was easy to implement - readline handled it for you.


Since wget is not an interactive command there's no need for such an 
environment variable. It is well suited in your examples: the MySQL 
client, scapy and python feature a built-in CLI, but it's not the case 
for wget. As Tim said, your shell is likely to store the latest commands 
in a history file for you. If you really want to store your history in 
~/.history/wget, then append 'export HISTFILE=~/.history/wget' to your 
~/.bashrc. But again, that's out of the scope of wget.



I am working on a (limited) interactive wget that accepts some commands 
from the user and allows autocompletion of links and other goodies, but 
that's still work in progress [1], and if some day sees the light, it 
will be as part of wget2, the next major release. I'll discuss this with 
the maintainers when the time comes, this is not the right thread. I'm 
expecting to finish it during summer.



The bottom line is that current wget is not interactive and so there's 
no need for that feature.


[1] https://github.com/juaristi/wget2/tree/wget2-interactive-mode

On 29/03/16 11:01, VendForce Security wrote:

I want to put all the console history files in one place

In mysql you can do
export MSQL_HISTFILE=~/.history/mysql
and the history file is created and stored

I can so this with mysql, bash, python, postgres, scapy

How can I do it with wget in linux using the $HOME/.bashrc

I've tried

export WGET_HISTORY=~/.history/wget
export WGET_HIST=~/.history/wget
export WGET_HIST=~/.history/wget

scapy devs have made a commit, so I'm only left with wget now. The rest
have or had  enviroment varible for export history

is there varible for history ?

Re: [Bug-wget] [PATCH] Some fixes for wget.texi

2016-03-24 Thread Ander Juaristi


Hi Tim,

On 23/03/16 12:55, Tim Ruehsen wrote:

I fixed a few things (basically URLs) in wget.texi.

There are still a few URLs left over where I wasn't sure. Especially FTP
examples since example.org doesn't have a FTP server.

Please review and comment

Changes overview:
 * wget.texi: Replace server.com by example.com,
   replace ftp://wuarchive.wustl.edu by https://example.com,
   use HTTPS instead of HTTP where possible,
   replace references to addictivecode.org by www.gnu.org/software/wget,


AFAIK addictivecode.org is maintained by Micah Cowan, who is no longer 
taking part in wget. I'm fine with that, as I am with removing all 
@addictivecode.org e-mail addresses from the docs.


But IMO wget.addictivecode.org has great documentation about some wget 
aspects not found in the official docs[1], such as how to compile 
(dependencies required, etc.)  or how to navigate through the source, 
which in spite of all the changes the code has suffered over time, is 
still kept relevant.


Lots of information there proved to be really useful to me when I 
started working on wget, and it still is. I have a couple of pages 
bookmarked in my browser. Removing all references to Micah's docs 
without first migrating them to the official docs is a real pity for me.


There's still a link in the official site[2], but keeping a link there 
and not in the texi looks kind of inconsistent to me.


[1] http://www.gnu.org/software/wget/manual/html_node/index.htmlç
[2] http://www.gnu.org/software/wget/


   fix list archive reference,
   remove reference to wget-not...@addictivecode.org,
   change bugtracker URL to bugtracker on Savannah,
   replace yoyodyne.com by example.com,
   fix URL to VMS port


Best regards,

 - AJ

Re: [Bug-wget] Google Summer of Code 2016

2016-03-06 Thread Ander Juaristi

I just wanted to share with you another idea I was thinking on some time 
now: WebDriver [1].


It's basically a protocol/API to communicate with UAs. It's intended to 
be UA-agnostic, so any client should be able to use WebDriver to 
communicate with a compliant UA. From the standard:


"WebDriver is a remote control interface that enables introspection 
and control of user agents. It provides a platform- and language-neutral 
wire protocol as a way for out-of-process programs to remotely instruct 
the behaviour of web browsers."


There are some requirements not at all supported in wget, such as XPath 
DOM traversal, so at first glance I can't give an estimate on whether 
how much time would be needed for this. It will not be too short, sure, 
but might be too big for a GSoC.


Regards,
- AJ

[1] https://www.w3.org/TR/webdriver/

El 03/03/2016 a las 11:21, Tim Ruehsen escribió:

Just more ideas for you, Kushagra:

There are many command line options from Wget still missing in Wget2, you
should have a look at
https://github.com/rockdaboot/wget2/wiki anyways - feel free to work on the
wiki yourself (e.g. fork the wiki pages:
https://help.github.com/articles/adding-and-editing-wiki-pages-locally/ or let
me know and I'll give you write access).

You can search the Wget bug tracker
(https://savannah.gnu.org/bugs/?group=wget) for wishlist items.
My favorite is https://savannah.gnu.org/bugs/?45803.
Special popen(2|3) functions/code is already in libwget/ directory.
E.g., that would allow Wget2 to be used as part of a recursive website malware
checker.

The authorization code in the test suite is not complete/not implemented - I
once tested authorization (MD5, MD5-sess) 'by hand' with my local Apache. But
a automated test is badly needed.

We thought of a statistic module (very basic code exists) for spider mode to
output diagnostics very detailed. Missing pages, response times, server load
(e.g. using the RTT/ping time), etc.

Tim

On Wednesday 02 March 2016 10:51:02 Kushagra Singh wrote:

Hi,

Thanks for the quick reply. I went through the repository and the issues,
and found a couple of things I would like to work on.

I have a couple of questions about Wget2. Is it a complete rewrite of the
Wget project, available at git://git.savannah.gnu.org/wget.git, or are we
using existing code and extending functionality? I guess it is the second
one because I saw `libwget` in the repo. However if such is the case, then
how do we change existing functions in wget? For example, implementing [2]
would require making changes to the file cookies.c, which is present in
/src in the wget repo, but not in /src in the wget2 repo.

I was looking at #43 [1], and have already submitted a patch for
consideration for the first suggestion [2]. The second suggestion mentioned
[3] is one of the things I'd like to work on, however this is not something
which will take three months :)

Another project I am interested in, is implementing FTPS. I saw this listed
under one of the ideas of GSoC 2015, but I'm not sure whether it was
implemented, as I didn't see it under 'Development Status' in the wget2
readme on Github.

Also, in #67 [4], we are talking about adhering to some specific parts of
RFC 7230. I'm not sure which all parts would be right, as the discussion
thread mentions that it won't be good to stick to each point of the RFC.
WDYT?


[1] https://github.com/rockdaboot/wget2/issues/43
[2] https://tools.ietf.org/html/draft-west-leave-secure-cookies-alone-04
[3] https://tools.ietf.org/html/draft-west-cookie-prefixes-05
[4] https://github.com/rockdaboot/wget2/issues/67

On Tue, Mar 1, 2016 at 9:57 PM, Giuseppe Scrivano  wrote:

Kushagra Singh  writes:

Hi,

Will we be taking part in GSoC this year? I would really like to work on


a


project related to Wget this summer. Any specific ideas that are of
importance to the community presently?


yes, we will be take part in GSoC.  I think we would like to see more
work happening on wget2, at the moment there is a list of issues on

github that can be useful to you to pick some ideas to work on:
   https://github.com/rockdaboot/wget2/issues

Could you take a look at it?  Do you see anything interesting that you
would like to work on?

Regards,
Giuseppe

Re: [Bug-wget] Google Summer of Code 2016

2016-03-04 Thread Ander Juaristi

> You mentioned FTPS... Ander Juaristi implemented this for Wget during GSOC
> 2015. Wget2 currently is lacking FTP and FTPS support (I just added some code
> for the test suite - tested only with Wget).

Yes, I wrote FTPS in wget, albeit not complete.

There are some FTPS commands, such as CCC, that were impossible to implement 
with the current wget SSL/TLS API. Implementing them would require enhancing 
the SSL/TLS API. I have some notes at home about how to do that, and promised I 
would show them to you, but still haven't. My fault. I'll try to do it 
tomorrow, since today I'm in a hotel in the center of Madrid, and won't be able 
to.

Right now, wget2 lacks both FTP and FTPS support. So I guess you have to first 
implement FTP in order to have FTPS. Well, in theory, it's not a technical 
impediment to implement FTPS directly, but makes more sense to have FTP first, 
since FTPS is just extending it to tunnel its traffic through TLS.
 
Regards,

- AJ

[Bug-wget] [bug #46943] Crash on old CPU w/o SSE2

2016-02-04 Thread Ander Juaristi

Follow-up Comment #5, bug #46943 (project wget):

Right, so I would mark this as a WONTFIX, and keep it here as a reference in
case someone complains again (this is implicit :D).

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/

Re: [Bug-wget] Implementing draft to update RFC6265

2016-01-31 Thread Ander Juaristi




On 01/30/2016 09:31 PM, Kushagra Singh wrote:

Hi,

I'm a bit stuck while writing tests. How do I test the fact that a secure
only cookie does not get saved over an insecure connection? Even if the
cookie gets saved, it will not be transmitted over an insecure connection
(cookie_matches_url() ensures that). So even though I can see in the log
that the cookie is not saved, I can't figure out how exactly to test that
in the test suite, since I cannot check using RejectHeader. Please find
attached the test I have written.



I've tried passing '--save-cookies=mycookies.wget', which creates a new file in 
the test's directory, and then adding that file to the 
'ExpectedDownloadedFiles' list, for Wget to check if it's there. The problem is 
the content check always fails, because Wget introduces comments at the top of 
the cookie file that include the time and date, and thus the test always fails 
saying that the contents don't match.

I guess the solution here would be to create a hook and test that a cookie file 
has been created, which has nothing but the initial comments.

I don't know if that would be possible. @Darshit?


And one thing I noticed, Test-Proto.py tries to import HTTP and HTTPS
classes from " misc.constants", which is wrong. It should be imported from
test.base_test right?

Regards,
Kushagra



Regards,
- AJ

Re: [Bug-wget] Implementing draft to update RFC6265

2016-01-31 Thread Ander Juaristi


The test looks good to me, but I think I've spotted a bug _in the test engine_ 
where the 'RejectHeader' rule doesn't get enforced.

You can strip the 'secure' parameter from this testcase and still it will pass. 
I've written a patch to fix this.

I.e. this:

---request begin---
GET /File2 HTTP/1.1
User-Agent: Wget/1.16.3.168-be847 (linux-gnu)
Accept: */*
Accept-Encoding: identity
Host: 127.0.0.1:44832
Connection: Keep-Alive
Cookie: sess-id=0213

---request end---
HTTP request sent, awaiting response... 127.0.0.1 - - [31/Jan/2016 17:33:20] "GET 
/File2 HTTP/1.1" 200 -

---response begin---
HTTP/1.1 200 OK
Server: BaseHTTP/0.6 Python/3.4.3+
Date: Sun, 31 Jan 2016 16:33:20 GMT
content-length: 29
content-type: text/plain

versus this:

---request begin---
GET /File2 HTTP/1.1
User-Agent: Wget/1.16.3.168-be847 (linux-gnu)
Accept: */*
Accept-Encoding: identity
Host: 127.0.0.1:37251
Connection: Keep-Alive
Cookie: sess-id=0213

---request end---
HTTP request sent, awaiting response... 127.0.0.1 - - [31/Jan/2016 17:34:18] 
code 400, message Blacklisted Header Cookie received
127.0.0.1 - - [31/Jan/2016 17:34:18] "GET /File2 HTTP/1.1" 400 -

---response begin---
HTTP/1.1 400 Blacklisted Header Cookie received
Server: BaseHTTP/0.6 Python/3.4.3+
Date: Sun, 31 Jan 2016 16:34:18 GMT
Content-Type: text/html;charset=utf-8
Connection: close
Content-Length: 483

---response end---
400 Blacklisted Header Cookie received
Header Cookie received
URI content encoding = ‘utf-8’
Disabling further reuse of socket 3.
Closed fd 3
2016-01-31 17:34:18 ERROR 400: Blacklisted Header Cookie received.

On 01/30/2016 09:31 PM, Kushagra Singh wrote:

Hi,

I'm a bit stuck while writing tests. How do I test the fact that a secure
only cookie does not get saved over an insecure connection? Even if the
cookie gets saved, it will not be transmitted over an insecure connection
(cookie_matches_url() ensures that). So even though I can see in the log
that the cookie is not saved, I can't figure out how exactly to test that
in the test suite, since I cannot check using RejectHeader. Please find
attached the test I have written.

And one thing I noticed, Test-Proto.py tries to import HTTP and HTTPS
classes from " misc.constants", which is wrong. It should be imported from
test.base_test right?

Regards,
Kushagra



Regards,
- AJ
>From 325c1de3894b86b7a708ea56cb45acfc59ebbfb7 Mon Sep 17 00:00:00 2001
From: Ander Juaristi 
Date: Sun, 31 Jan 2016 17:27:11 +0100
Subject: [PATCH] Enforce 'RejectHeader' rule in tests

 * server/http/http_server.py (_Handler.RejectHeader): enforce
   'RejectHeader' rule.
---
 testenv/server/http/http_server.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/testenv/server/http/http_server.py b/testenv/server/http/http_server.py
index 78aa605..e96f6e8 100644
--- a/testenv/server/http/http_server.py
+++ b/testenv/server/http/http_server.py
@@ -369,7 +369,7 @@ class _Handler(BaseHTTPRequestHandler):
 rej_headers = header_obj.headers
 for header_line in rej_headers:
 header_recd = self.headers.get(header_line)
-if not header_recd and header_recd == rej_headers[header_line]:
+if header_recd and header_recd == rej_headers[header_line]:
 self.send_error(400, 'Blacklisted Header %s received' %
 header_line)
 self.finish_headers()
-- 
2.5.0

[Bug-wget] [bug #46943] Crash on old CPU w/o SSE2

2016-01-21 Thread Ander Juaristi

Follow-up Comment #1, bug #46943 (project wget):

Hi,

Could you be more specific, please? Some detail about the mechanics of the
crash would be helpful.

I don't think SSE2 is relevant here. AFAIK wget does not use any form of SIMD
processing, and we've seen it work in old CPUs.

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/

Re: [Bug-wget] Fwd: New Defects reported by Coverity Scan for GNU Wget

2015-12-13 Thread Ander Juaristi

ee (store);

** CID 1273467:  API usage errors  (BUFFER_SIZE)
/lib/md5.c: 291 in md5_process_bytes()



*** CID 1273467:  API usage errors  (BUFFER_SIZE)
/lib/md5.c: 291 in md5_process_bytes()
285   memcpy (&((char *) ctx->buffer)[left_over], buffer, len);
286   left_over += len;
287   if (left_over >= 64)
288 {
289   md5_process_block (ctx->buffer, 64, ctx);
290   left_over -= 64;
>>> CID 1273467:  API usage errors  (BUFFER_SIZE)
>>> The source buffer "&ctx->buffer[16]" potentially overlaps with the destination 
buffer "ctx->buffer", which results in undefined behavior for memcpy.
291   memcpy (ctx->buffer, &ctx->buffer[16], left_over);
292 }
293   ctx->buflen = left_over;
294 }
295 }
296

** CID 1273466:  API usage errors  (BUFFER_SIZE)
/lib/sha256.c: 411 in sha256_process_bytes()



*** CID 1273466:  API usage errors  (BUFFER_SIZE)
/lib/sha256.c: 411 in sha256_process_bytes()
405   memcpy (&((char *) ctx->buffer)[left_over], buffer, len);
406   left_over += len;
407   if (left_over >= 64)
408 {
409   sha256_process_block (ctx->buffer, 64, ctx);
410   left_over -= 64;
>>> CID 1273466:  API usage errors  (BUFFER_SIZE)
>>> The source buffer "&ctx->buffer[16]" potentially overlaps with the destination 
buffer "ctx->buffer", which results in undefined behavior for memcpy.
411   memcpy (ctx->buffer, &ctx->buffer[16], left_over);
412 }
413   ctx->buflen = left_over;
414 }
415 }
416

** CID 1273463:  API usage errors  (BUFFER_SIZE)
/lib/sha1.c: 278 in sha1_process_bytes()



*** CID 1273463:  API usage errors  (BUFFER_SIZE)
/lib/sha1.c: 278 in sha1_process_bytes()
272   memcpy (&((char *) ctx->buffer)[left_over], buffer, len);
273   left_over += len;
274   if (left_over >= 64)
275 {
276   sha1_process_block (ctx->buffer, 64, ctx);
277   left_over -= 64;
>>> CID 1273463:  API usage errors  (BUFFER_SIZE)
>>> The source buffer "&ctx->buffer[16]" potentially overlaps with the destination 
buffer "ctx->buffer", which results in undefined behavior for memcpy.
278   memcpy (ctx->buffer, &ctx->buffer[16], left_over);
279 }
280   ctx->buflen = left_over;
281 }
282 }
283

** CID 420711:  Insecure data handling  (INTEGER_OVERFLOW)
/lib/str-two-way.h: 221 in critical_factorization()



*** CID 420711:  Insecure data handling  (INTEGER_OVERFLOW)
/lib/str-two-way.h: 221 in critical_factorization()
215  lexicographic suffix of 'a' works for 'bba', but not 'ab' for
216  'aab'.  The shorter suffix of the two will always be a critical
217  factorization.  */
218   if (max_suffix_rev + 1 < max_suffix + 1)
219 return max_suffix + 1;
220   *period = p;
>>> CID 420711:  Insecure data handling  (INTEGER_OVERFLOW)
>>> Overflowed or truncated value (or a value computed from an overflowed or 
truncated value) "max_suffix_rev + 1UL" used as return value.
221   return max_suffix_rev + 1;
222 }
223
224 /* Return the first location of non-empty NEEDLE within HAYSTACK, or
225NULL.  HAYSTACK_LEN is the minimum known length of HAYSTACK.  This
226method is optimized for NEEDLE_LEN < LONG_NEEDLE_THRESHOLD.



To view the defects in Coverity Scan visit,
https://scan.coverity.com/projects/gnu-wget?tab=overview

To manage Coverity Scan email notifications for "dar...@gmail.com",
click 
https://scan.coverity.com/subscriptions/edit?email=darnir%40gmail.com&token=a247cf0e017fe1ea3e52680a7e0c1fcf








From c78aee99ba099a61af26c9df4c072e6e7a65cb03 Mon Sep 17 00:00:00 2001
From: Ander Juaristi 
Date: Wed, 9 Dec 2015 17:12:51 +0100
Subject: [PATCH] Fix Coverity issues

* src/ftp.c (getftp): on error, close the file and attempt to remove it
  before exiting.
* src/hsts.c (hsts_store_open): update modification time in the end.
---
src/ftp.c  | 16 +---
src/hsts.c | 16 +---
2 files changed, 22 insertion

Re: [Bug-wget] GNU wget 1.17.1 released

2015-12-13 Thread Ander Juaristi


Hi Andries,

On 12/11/2015 09:03 PM, Andries E. Brouwer wrote:

On Fri, Dec 11, 2015 at 08:22:23PM +0100, Giuseppe Scrivano wrote:

Hello,

I am pleased to announce the new version of GNU wget.  We consider it a
bug fixes release as it addresses issues found in 1.17, which contained
quite a few new features.

Please report any problem you may experience to the bug-wget@gnu.org
mailing list.


Four months ago I mentioned two bugs:
1. Non-ASCII filenames are mistreated
2. The progress bar is broken when the filename is non-ASCII.
And I provided patches that fix 2, and fix 1 on Unix.

Tim Ruehsen polished the second patch a bit more, and then
nothing more was heard about it.
The first patch was for Unix, and there was some amount of discussion
of the Windows situation with Eli Zaretskii. But nobody offered
a Windows patch, and my patch is OK, but Unix-only.



True.

For me, there's no problem in applying the patch for Unix only (disable the 
code on Windows via preprocessor firewalling) and it'll be ready when we 
release 1.17.2. And in the meanwhile maybe someone comes with a working windows 
port.

What do the others think?


Now that wget-1.17.1 is out, let me try it on the Russian Wikipedia page
for the page "heart", Сердце, the same example we used last August.

With my patch I get:

Saving to: ‘Сердце’

With wget-1.17.1 I get:

% wget/wget-1.17.1/src/wget 
https://ru.wikipedia.org/wiki/%D0%A1%D0%B5%D1%80%D0%B4%D1%86%D0%B5

Saving to: ‘Се\321%80д\321%86е’.
Here wget saves to a name that is not a legal name on this filesystem.
The progress bar is still broken and contains illegal characters.

So, I find that wget-1.17.1 is still broken, and no good for downloading
files with UTF-8 filename from an UTF-8 site to a local UTF-8 site.
There is some default built-in ISO 8859-* ugliness.

Andries



Regards,
- AJ

Re: [Bug-wget] --no-check-cert does not avoid cert warning

2015-12-01 Thread Ander Juaristi




On 12/01/2015 09:31 AM, Tim Ruehsen wrote:

Are you working on *nix ?

Try wget ... |& grep -v "WARNING: cannot verify"

To filter out the warnings you don't want to see. You could use egrep to
filter different lines at once.



That's a good idea but IMO it's a bit hackish. What's more, I think `|&' it's 
not a *nix feature, but rather a Bash feature.

It's basically `2>&1 |' behind the scenes, but worth pointing it out, anyway 
[1].

[1] http://www.gnu.org/software/bash/manual/bashref.html#Pipelines


If you are working with self-signed certificates, use --ca-certificate=... to
allow wget perform a proper check.

Tim

On Monday 30 November 2015 17:43:54 Karl Berry wrote:

 An alternative to make  --no-check-certificate silent would be to
 provide a parameter to explicitely silence it:
   --no-check-certificate=quiet

Sure, sounds fine.  Or any other method ...




Regards,
- AJ

Re: [Bug-wget] bug in texlive 2015

2015-12-01 Thread Ander Juaristi


Hi there,
On 12/01/2015 03:40 PM, abouzar jafari wrote:

hi guys!i want to install the texlive 2015 but I can't do this!!always i saw 
this black page that i mentioned here, and i can't solve this problem, any how 
i hope you can help me to find a way for installing the texlive!regardsabouzar



An identical issue was already discussed not so much ago here: 
http://lists.gnu.org/archive/html/bug-wget/2015-10/msg9.html

I'm afraid this is not a Wget problem, but rather a problem with the latex 
installer itself, and so there's nothing we can do about it. I suggest you head 
on to the installer authors and ask them. They're better suited than us to help 
you.

Regards,
- AJ

[Bug-wget] [bug #46479] null pointer dereference: gnutls_free (ctx->session_data->data)

2015-11-20 Thread Ander Juaristi

Follow-up Comment #1, bug #46479 (project wget):

The attached patch should fix it. Thanks for the report.

(file #35506)
___

Additional Item Attachment:

File name: 0001-Fix-potential-NULL-pointer-dereference.patch Size:1 KB


___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/

Re: [Bug-wget] Wget 1.17 doesn't compile on Windows (hsts.c)

2015-11-19 Thread Ander Juaristi


Hi Tim,

On 11/18/2015 11:09 AM, Tim Ruehsen wrote:

Hi Dagobert, hi Ander,

I could reproduce and fix the problem on a 32bit Debian wheezy VM.

@Dagobert Please check if the attached patch works on Solaris as well
@Ander Please review that I did not get anything wrong



I moved the calls to xfree (kh->host) a few lines up because I couldn't pass the 
unit tests after applying your patch (64-bit Linux Mint). The reason was simply that 
kh->host had garbage, and thus free() complained.

And also:

PASS: Test--post-file.px
*** Error in `../src/wget': free(): invalid pointer: 0x00d2c5f0 ***
PASS: Test-proxied-https-auth.px
*** Error in `../src/wget': free(): invalid pointer: 0x021cc5f0 ***

@Dagobert please check that it still passes all the tests in Solaris 32-bit. 
For some reason I couldn't compile it on a 32-bit Debian, and had no time to 
look into it further.


Tim



Regards,
- AJ
>From 9b8f45ca6f6861fab94bf7e463dadc9f83f540e9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tim=20R=C3=BChsen?= 
Date: Wed, 18 Nov 2015 10:58:56 +0100
Subject: [PATCH] Fix HSTS memory issue + test code issue

* src/hsts.c (hsts_find_entry): Fix freeing memory
  (hsts_remove_entry): Remove freeing host member
  (hsts_match): Free host member here
  (hsts_store_entry): Free host member here
  (test_url_rewrite): Fix 'created' value
  (test_hsts_read_database): Fix 'created' value

Reported-by: Dagobert Michelsen 
---
 src/hsts.c | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/src/hsts.c b/src/hsts.c
index b0989c7..3ddbf72 100644
--- a/src/hsts.c
+++ b/src/hsts.c
@@ -148,13 +148,14 @@ hsts_find_entry (hsts_store_t store,
 end:
   /* restore pointer or we'll get a SEGV */
   k->host = org_ptr;
-  xfree (k->host);
 
   /* copy parameters to previous frame */
   if (match_type)
 *match_type = match;
   if (kh)
 memcpy (kh, k, sizeof (struct hsts_kh));
+  else
+xfree (k->host);
 
   xfree (k);
   return khi;
@@ -236,8 +237,7 @@ hsts_new_entry (hsts_store_t store,
 static void
 hsts_remove_entry (hsts_store_t store, struct hsts_kh *kh)
 {
-  if (hash_table_remove (store->table, kh))
-xfree (kh->host);
+  hash_table_remove (store->table, kh);
 }
 
 static bool
@@ -375,9 +375,10 @@ hsts_match (hsts_store_t store, struct url *u)
   else
 hsts_remove_entry (store, kh);
 }
+  xfree (kh->host);
 }
 
-  xfree(kh);
+  xfree (kh);
 
   return url_changed;
 }
@@ -451,9 +452,10 @@ hsts_store_entry (hsts_store_t store,
   result = hsts_add_entry (store, host, port, max_age, include_subdomains);
 }
   /* we ignore new entries with max_age == 0 */
+  xfree (kh->host);
 }
 
-  xfree(kh);
+  xfree (kh);
 
   return result;
 }
@@ -613,7 +615,7 @@ test_url_rewrite (hsts_store_t s, const char *url, int port, bool rewrite)
   if (rewrite)
 {
   if (port == 80)
-   mu_assert("URL: port should've been rewritten to 443", u.port == 443);
+mu_assert("URL: port should've been rewritten to 443", u.port == 443);
   else
 mu_assert("URL: port should've been left intact", u.port == port);
   mu_assert("URL: scheme should've been rewritten to HTTPS", u.scheme == SCHEME_HTTPS);
@@ -686,7 +688,7 @@ test_hsts_url_rewrite_superdomain (void)
   s = open_hsts_test_store ();
   mu_assert("Could not open the HSTS store", s != NULL);
 
-  created = hsts_store_entry (s, SCHEME_HTTPS, "www.foo.com", 443, time(NULL) + 1234, true);
+  created = hsts_store_entry (s, SCHEME_HTTPS, "www.foo.com", 443, 1234, true);
   mu_assert("A new entry should've been created", created == true);
 
   TEST_URL_RW (s, "www.foo.com", 80);
@@ -707,7 +709,7 @@ test_hsts_url_rewrite_congruent (void)
   s = open_hsts_test_store ();
   mu_assert("Could not open the HSTS store", s != NULL);
 
-  created = hsts_store_entry (s, SCHEME_HTTPS, "foo.com", 443, time(NULL) + 1234, false);
+  created = hsts_store_entry (s, SCHEME_HTTPS, "foo.com", 443, 1234, false);
   mu_assert("A new entry should've been created", created == true);
 
   TEST_URL_RW (s, "foo.com", 80);
@@ -726,6 +728,7 @@ test_hsts_read_database (void)
   char *home = home_dir();
   char *file = NULL;
   FILE *fp = NULL;
+  time_t created = time(NULL) - 10;
 
   if (home)
 {
@@ -734,9 +737,9 @@ test_hsts_read_database (void)
   if (fp)
 {
   fputs ("# dummy comment\n", fp);
-  fputs ("foo.example.com\t0\t1\t1434224817\t123123123\n", fp);
-  fputs ("bar.example.com\t0\t0\t1434224817\t456456456\n", fp);
-  fputs ("test.example.com\t8080\t0\t1434224817\t789789789\n", fp);
+  fprintf (fp, "foo.example.com\t0\t1\t%ld\t123\n",(long) created);
+  fprintf (fp, "bar.example.com\t0\t0\t%ld\t456\n", (long) created);
+  fprintf (fp, "test.example.com\t8080\t0\t%ld\t789\n", (long) created);
   fclose (fp);
 
   table = hsts_store_open (file);
-- 
1.9.1

Re: [Bug-wget] Wget2 Introduction

2015-10-14 Thread Ander Juaristi


Hi Tim,

Just something that came to my mind, regarding:

Wget2 will stay as an own executable separate from Wget - at least for a
while.
Until both tools merge one day...
So people can test Wget2 without endangering their existing architecture and
scripts.

I don't think it's a good idea to speculate in the event of a future merge of 
Wget2 into Wget. It may happen one day, or it may never happen. I personally 
think Wget2 will never merge into Wget strictly speaking, but rather, one day 
we'll drop support for Wget and maintain Wget2 only. Of course, I might be 
wrong, but I think it's better not to say anything about this.

On 10/13/2015 12:38 PM, Tim Ruehsen wrote:

Hi,

I would like to introduce Wget2 on the wget mailing list to get some help in
finishing a first release.

I am really not a good text writer and want to ask you for help to
finish/polish the markdown text appended.

Please revise the text, ask if something is unclear, maybe something that you
want to see is missing here. The text should be clear, not too long or boring.

Please send me your changes / opinions - thanks in advance !

Regards, Tim



Regards,
- AJ

Re: [Bug-wget] To download files from urls within js or css

2015-10-10 Thread Ander Juaristi


On 10/08/2015 07:37 PM, Jimmy Willer Maco Elera wrote:

How I can do to detect the urls within the js or css to be downloaded files?



Pass -r/--recursive, and Wget will recursively follow all the links it finds. 
That includes JS and CSS.

Regards,
- AJ

Re: [Bug-wget] [PATCH] Update HSTS info

2015-10-08 Thread Ander Juaristi


Hi Tim,

Thanks for your comments. I re-send the first patch. No changes on the second.

On 10/08/2015 10:37 AM, Tim Ruehsen wrote:

+  flock (fd, LOCK_UN);
+  fclose (f);
You are using buffered I/O, fclose() is effectively a write() + close().
That opens a hole here if you unlock the file before write().
IMO, to avoid that hole you could just drop the explicit unlock. It will be
performed automatically when the file is closed by fclose().



You were right, the lock is released by fclose(). Obvious, isn't it? I mean, 
that's what one would expect.

I thought about this but I wasn't really sure. The docs are not 100% clear IMO:

"Furthermore, the lock is released either by an explicit LOCK_UN operation 
on any of these  duplicate  descriptors,
 or when all such descriptors have been closed."

Regards,
- AJ
>From dab29c0b517d4fedaf7df46bd80fc506dd3699ad Mon Sep 17 00:00:00 2001
From: Ander Juaristi 
Date: Mon, 5 Oct 2015 23:03:45 +0200
Subject: [PATCH] Fix potential race condition

 * src/hsts.c (hsts_read_database): get an open file handle
   instead of a file name.
   (hsts_store_dump): get an open file handle
   instead of a file name.
   (hsts_store_open): open the file and pass the open file handle.
   (hsts_store_save): lock the file before the read-merge-dump
   process.

 Reported-by: Daniel Kahn Gillmor 
---
 src/hsts.c | 126 +
 1 file changed, 68 insertions(+), 58 deletions(-)

diff --git a/src/hsts.c b/src/hsts.c
index 5c4ca35..ab2f41c 100644
--- a/src/hsts.c
+++ b/src/hsts.c
@@ -39,13 +39,16 @@ as that of the covered work.  */
 #include "c-ctype.h"
 #ifdef TESTING
 #include "test.h"
-#include  /* for unlink(), used only in tests */
 #endif
 
+#include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
 
 struct hsts_store {
   struct hash_table *table;
@@ -263,9 +266,8 @@ hsts_store_merge (hsts_store_t store,
 }
 
 static bool
-hsts_read_database (hsts_store_t store, const char *file, bool merge_with_existing_entries)
+hsts_read_database (hsts_store_t store, FILE *fp, bool merge_with_existing_entries)
 {
-  FILE *fp = NULL;
   char *line = NULL, *p;
   size_t len = 0;
   int items_read;
@@ -279,67 +281,54 @@ hsts_read_database (hsts_store_t store, const char *file, bool merge_with_existi
 
   func = (merge_with_existing_entries ? hsts_store_merge : hsts_new_entry);
 
-  fp = fopen (file, "r");
-  if (fp)
+  while (getline (&line, &len, fp) > 0)
 {
-  while (getline (&line, &len, fp) > 0)
-{
-  for (p = line; c_isspace (*p); p++)
-;
-
-  if (*p == '#')
-continue;
+  for (p = line; c_isspace (*p); p++)
+;
 
-  items_read = sscanf (p, "%255s %d %d %lu %lu",
-   host,
-   &port,
-   &include_subdomains,
-   (unsigned long *) &created,
-   (unsigned long *) &max_age);
+  if (*p == '#')
+continue;
 
-  if (items_read == 5)
-func (store, host, port, created, max_age, !!include_subdomains);
-}
-
-  xfree (line);
-  fclose (fp);
+  items_read = sscanf (p, "%255s %d %d %lu %lu",
+   host,
+   &port,
+   &include_subdomains,
+   (unsigned long *) &created,
+   (unsigned long *) &max_age);
 
-  result = true;
+  if (items_read == 5)
+func (store, host, port, created, max_age, !!include_subdomains);
 }
 
+  xfree (line);
+  result = true;
+
   return result;
 }
 
 static void
-hsts_store_dump (hsts_store_t store, const char *filename)
+hsts_store_dump (hsts_store_t store, FILE *fp)
 {
-  FILE *fp = NULL;
   hash_table_iterator it;
 
-  fp = fopen (filename, "w");
-  if (fp)
+  /* Print preliminary comments. We don't care if any of these fail. */
+  fputs ("# HSTS 1.0 Known Hosts database for GNU Wget.\n", fp);
+  fputs ("# Edit at your own risk.\n", fp);
+  fputs ("# [:]\t\t\t\n", fp);
+
+  /* Now cycle through the HSTS store in memory and dump the entries */
+  for (hash_table_iterate (store->table, &it); hash_table_iter_next (&it);)
 {
-  /* Print preliminary comments. We don't care if any of these fail. */
-  fputs ("# HSTS 1.0 Known Hosts database for GNU Wget.\n", fp);
-  fputs ("# Edit at your own risk.\n", fp);
-  fputs ("# [:]\t\t\t\n", fp);
+  struct hsts_kh *kh = (struct hsts_kh *) it.key;
+  struct hsts_kh_info *khi = (struct hsts_kh_info *) it.value;
 
-  /* Now cycle through the HSTS store in memory and dump

Re: [Bug-wget] [PATCH] Update HSTS info

2015-10-07 Thread Ander Juaristi


Hi,

I've written two patches for the HSTS code. Please review them when you have 
time.

In august, dkg catched a potential race condition in Wget's HSTS code (replayed 
message below). The first patch aims to solve that issue using flock(2). It 
works well on my machine: I've tested it by spawning two Wget processes on top 
of gdb.

The second patch fixes what's in my opinion a critical bug. It only triggered 
when more than one Wget processes used the same HSTS database simultaneously, 
and that's why it's gone unnoticed until now. I noticed it while I was working 
on the first patch, so it could as well be part of it, but IMO this bug is more 
related to the principles of our HSTS design itself than to the specific race 
condition issue.


On 08/18/2015 09:12 PM, Ander Juaristi wrote:

On 08/10/2015 06:06 PM, Daniel Kahn Gillmor wrote:



This sounds like there might still be a race condition where one
process's changes get clobbered.  can we make any atomicity guarantees
for users who might be concerned about this?


You're right. My fault not to take this into account. This could be fixed with 
flock/fcntl. I think they're both in gnulib.

These last two issues require code changes. I'll take the responsibility to fix 
them, but outside of GSoC.  The first one requires a bit of consensus before 
coding anything, but the second one seems a bit more straightforward. For now, 
I attach the two previous patches. The first one (the HSTS docs) amended with 
dkg's suggestions. If there're no more complaints, I think they can be pushed, 
because Wget's behaviour has not changed yet. When we implement any of the 
ideas proposed above, we'll update the docs.



Regards,
- AJ
>From 826d204421434a3c64d363b8b66c5bbc24657274 Mon Sep 17 00:00:00 2001
From: Ander Juaristi 
Date: Mon, 5 Oct 2015 23:03:45 +0200
Subject: [PATCH 1/2] Fix potential race condition

 * src/hsts.c (hsts_read_database): get an open file handle
   instead of a file name.
   (hsts_store_dump): get an open file handle
   instead of a file name.
   (hsts_store_open): open the file and pass the open file handle.
   (hsts_store_save): lock the file before the read-merge-dump
   process.

 Reported-by: Daniel Kahn Gillmor 

---
 src/hsts.c | 125 +
 1 file changed, 68 insertions(+), 57 deletions(-)

diff --git a/src/hsts.c b/src/hsts.c
index 5c4ca35..c7339b8 100644
--- a/src/hsts.c
+++ b/src/hsts.c
@@ -39,13 +39,16 @@ as that of the covered work.  */
 #include "c-ctype.h"
 #ifdef TESTING
 #include "test.h"
-#include  /* for unlink(), used only in tests */
 #endif
 
+#include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
 
 struct hsts_store {
   struct hash_table *table;
@@ -263,9 +266,8 @@ hsts_store_merge (hsts_store_t store,
 }
 
 static bool
-hsts_read_database (hsts_store_t store, const char *file, bool merge_with_existing_entries)
+hsts_read_database (hsts_store_t store, FILE *fp, bool merge_with_existing_entries)
 {
-  FILE *fp = NULL;
   char *line = NULL, *p;
   size_t len = 0;
   int items_read;
@@ -279,67 +281,54 @@ hsts_read_database (hsts_store_t store, const char *file, bool merge_with_existi
 
   func = (merge_with_existing_entries ? hsts_store_merge : hsts_new_entry);
 
-  fp = fopen (file, "r");
-  if (fp)
+  while (getline (&line, &len, fp) > 0)
 {
-  while (getline (&line, &len, fp) > 0)
-{
-  for (p = line; c_isspace (*p); p++)
-;
-
-  if (*p == '#')
-continue;
+  for (p = line; c_isspace (*p); p++)
+;
 
-  items_read = sscanf (p, "%255s %d %d %lu %lu",
-   host,
-   &port,
-   &include_subdomains,
-   (unsigned long *) &created,
-   (unsigned long *) &max_age);
-
-  if (items_read == 5)
-func (store, host, port, created, max_age, !!include_subdomains);
-}
+  if (*p == '#')
+continue;
 
-  xfree (line);
-  fclose (fp);
+  items_read = sscanf (p, "%255s %d %d %lu %lu",
+   host,
+   &port,
+   &include_subdomains,
+   (unsigned long *) &created,
+   (unsigned long *) &max_age);
 
-  result = true;
+  if (items_read == 5)
+func (store, host, port, created, max_age, !!include_subdomains);
 }
 
+  xfree (line);
+  result = true;
+
   return result;
 }
 
 static void
-hsts_store_dump (hsts_store_t store, const char *filename)
+hsts_store_dump (hsts_store_t store, FILE *fp)
 {
-  FILE *fp = NULL;
   hash_table_iterator it;
 
-  fp = fopen (filena

Re: [Bug-wget] [PATCH] Re: [RFE / project idea]: convert-links for "transparent proxy" mode

2015-10-01 Thread Ander Juaristi


Hi Gabriel,

You're right, silly me.

This is (hopefully) the final version.

Thanks,
- AJ

On 09/30/2015 02:43 PM, Gabriel L. Somlo wrote:

+@item --convert-file-only
+This option converts only the filename part of the URLs, leaving the rest
+of the URLs untouched. This filename part is sometimes referred to as the
+"basename", although we avoid that term here in order not to cause confusion.
+
+It works particularly well in conjunction with @samp{--convert-links}, although


shouldn't that be, "works particularly well with --adjust-extension"
instead ?


+this coupling is not enforced. It proves useful to populate Internet caches
+with files downloaded from different hosts.
+
+Example: if some link points to @file{//foo.com/bar.cgi?xyz} with
+@samp{--convert-links} asserted and its local destination is intended to be


again, shouldn't this be --adjust-extension ?

With the above manpage issues addressed,

Reviewed-by: Gabriel Somlo 

Thanks much,
--Gabriel



>From 9b4a835af24ed420f18c6531098d823c98bfa74d Mon Sep 17 00:00:00 2001
From: Ander Juaristi 
Date: Tue, 22 Sep 2015 21:10:38 +0200
Subject: [PATCH] Added --convert-file-only option

 * src/convert.c (convert_links_in_hashtable, convert_links):
   test for CO_CONVERT_BASENAME_ONLY.
   (convert_basename): new function.
 * src/convert.h: new constant CO_CONVERT_BASENAME_ONLY.
 * src/init.c, src/main.c, src/options.h: new option "--convert-file-only".
 * doc/wget.texi: updated documentation.

 Reviewed-by: Gabriel Somlo 

---
 doc/wget.texi | 17 +++
 src/convert.c | 96 +--
 src/convert.h |  2 ++
 src/init.c|  2 ++
 src/main.c| 24 +++
 src/options.h |  3 ++
 6 files changed, 136 insertions(+), 8 deletions(-)

diff --git a/doc/wget.texi b/doc/wget.texi
index f1244aa..0a139e3 100644
--- a/doc/wget.texi
+++ b/doc/wget.texi
@@ -2123,6 +2123,23 @@ Note that only at the end of the download can Wget know which links have
 been downloaded.  Because of that, the work done by @samp{-k} will be
 performed at the end of all the downloads.
 
+@item --convert-file-only
+This option converts only the filename part of the URLs, leaving the rest
+of the URLs untouched. This filename part is sometimes referred to as the
+"basename", although we avoid that term here in order not to cause confusion.
+
+It works particularly well in conjunction with @samp{--adjust-extension}, although
+this coupling is not enforced. It proves useful to populate Internet caches
+with files downloaded from different hosts.
+
+Example: if some link points to @file{//foo.com/bar.cgi?xyz} with
+@samp{--adjust-extension} asserted and its local destination is intended to be
+@file{./foo.com/bar.cgi?xyz.css}, then the link would be converted to
+@file{//foo.com/bar.cgi?xyz.css}. Note that only the filename part has been
+modified. The rest of the URL has been left untouched, including the net path
+(@code{//}) which would otherwise be processed by Wget and converted to the
+effective scheme (ie. @code{http://}).
+
 @cindex backing up converted files
 @item -K
 @itemx --backup-converted
diff --git a/src/convert.c b/src/convert.c
index f0df9a0..8e9aa60 100644
--- a/src/convert.c
+++ b/src/convert.c
@@ -46,6 +46,7 @@ as that of the covered work.  */
 #include "html-url.h"
 #include "css-url.h"
 #include "iri.h"
+#include "xstrndup.h"
 
 static struct hash_table *dl_file_url_map;
 struct hash_table *dl_url_file_map;
@@ -136,8 +137,9 @@ convert_links_in_hashtable (struct hash_table *downloaded_set,
  form.  We do this even if the URL already is in
  relative form, because our directory structure may
  not be identical to that on the server (think `-nd',
- `--cut-dirs', etc.)  */
-  cur_url->convert = CO_CONVERT_TO_RELATIVE;
+ `--cut-dirs', etc.). If --convert-file-only was passed,
+ we only convert the basename portion of the URL.  */
+  cur_url->convert = (opt.convert_file_only ? CO_CONVERT_BASENAME_ONLY : CO_CONVERT_TO_RELATIVE);
   cur_url->local_name = xstrdup (local_name);
   DEBUGP (("will convert url %s to local %s\n", u->url, local_name));
 }
@@ -206,6 +208,7 @@ static const char *replace_attr_refresh_hack (const char *, int, FILE *,
   const char *, int);
 static char *local_quote_string (const char *, bool);
 static char *construct_relative (const char *, const char *);
+static char *convert_basename (const char *, const struct urlpos *);
 
 /* Change the links in one file.  LINKS is a list of links in the
document, along with their positions and the desired direction of
@@ -315,11 +318,34 @@ convert_links (const char *file, struct urlpos *links)
 
 DEBUGP ((

Re: [Bug-wget] [PATCH] Re: [RFE / project idea]: convert-links for "transparent proxy" mode

2015-09-30 Thread Ander Juaristi


Hi Gabriel,

Thanks for your helpful comments.

I've corrected the patch.

For me, the most embarrassing thing was having forgotten the commit description 
:D

On 09/24/2015 04:13 PM, Gabriel L. Somlo wrote:

Hi AJ,

Thanks for implementing this! I tested it, and the functionality
appears correct.

I've added a more detailed review inline, below.

+static char *
+convert_basename (const char *p, const struct urlpos *link)
+{
+  int len = link->size;
+  char *url = NULL;
+  char *org_basename = NULL, *local_basename = NULL;
+  char *result = url;


None of the string variables above really need to be initialized,
since you're going to assign them unconditionally below regardless.



I always initialize local variables by default. For me it's good practice.



Consider inverting the test. If basenames are *equal*, return 'url'
immediately, and save an unnecessary xstrdup() + xfree().

Otherwise, call uri_merge() and xfree(url) before returning the
result.



Done.

Finally, I also updated the documentation at doc/wget.texi.



Thanks much,
--Gabriel




Regards,
- AJ
>From 9b4a835af24ed420f18c6531098d823c98bfa74d Mon Sep 17 00:00:00 2001
From: Ander Juaristi 
Date: Tue, 22 Sep 2015 21:10:38 +0200
Subject: [PATCH] Added --convert-file-only option

 * src/convert.c (convert_links_in_hashtable, convert_links):
   test for CO_CONVERT_BASENAME_ONLY.
   (convert_basename): new function.
 * src/convert.h: new constant CO_CONVERT_BASENAME_ONLY.
 * src/init.c, src/main.c, src/options.h: new option "--convert-file-only".
 * doc/wget.texi: updated documentation.

 Reported-By: Gabriel L. Somlo 

---
 doc/wget.texi | 17 +++
 src/convert.c | 96 +--
 src/convert.h |  2 ++
 src/init.c|  2 ++
 src/main.c| 24 +++
 src/options.h |  3 ++
 6 files changed, 136 insertions(+), 8 deletions(-)

diff --git a/doc/wget.texi b/doc/wget.texi
index f1244aa..0a139e3 100644
--- a/doc/wget.texi
+++ b/doc/wget.texi
@@ -2123,6 +2123,23 @@ Note that only at the end of the download can Wget know which links have
 been downloaded.  Because of that, the work done by @samp{-k} will be
 performed at the end of all the downloads.
 
+@item --convert-file-only
+This option converts only the filename part of the URLs, leaving the rest
+of the URLs untouched. This filename part is sometimes referred to as the
+"basename", although we avoid that term here in order not to cause confusion.
+
+It works particularly well in conjunction with @samp{--convert-links}, although
+this coupling is not enforced. It proves useful to populate Internet caches
+with files downloaded from different hosts.
+
+Example: if some link points to @file{//foo.com/bar.cgi?xyz} with
+@samp{--convert-links} asserted and its local destination is intended to be
+@file{./foo.com/bar.cgi?xyz.css}, then the link would be converted to
+@file{//foo.com/bar.cgi?xyz.css}. Note that only the filename part has been
+modified. The rest of the URL has been left untouched, including the net path
+(@code{//}) which would otherwise be processed by Wget and converted to the
+effective scheme (ie. @code{http://}).
+
 @cindex backing up converted files
 @item -K
 @itemx --backup-converted
diff --git a/src/convert.c b/src/convert.c
index f0df9a0..8e9aa60 100644
--- a/src/convert.c
+++ b/src/convert.c
@@ -46,6 +46,7 @@ as that of the covered work.  */
 #include "html-url.h"
 #include "css-url.h"
 #include "iri.h"
+#include "xstrndup.h"
 
 static struct hash_table *dl_file_url_map;
 struct hash_table *dl_url_file_map;
@@ -136,8 +137,9 @@ convert_links_in_hashtable (struct hash_table *downloaded_set,
  form.  We do this even if the URL already is in
  relative form, because our directory structure may
  not be identical to that on the server (think `-nd',
- `--cut-dirs', etc.)  */
-  cur_url->convert = CO_CONVERT_TO_RELATIVE;
+ `--cut-dirs', etc.). If --convert-file-only was passed,
+ we only convert the basename portion of the URL.  */
+  cur_url->convert = (opt.convert_file_only ? CO_CONVERT_BASENAME_ONLY : CO_CONVERT_TO_RELATIVE);
   cur_url->local_name = xstrdup (local_name);
   DEBUGP (("will convert url %s to local %s\n", u->url, local_name));
 }
@@ -206,6 +208,7 @@ static const char *replace_attr_refresh_hack (const char *, int, FILE *,
   const char *, int);
 static char *local_quote_string (const char *, bool);
 static char *construct_relative (const char *, const char *);
+static char *convert_basename (const char *, const struct urlpos *);
 
 /* Change the links in one file.  LINKS is a list of links in the
document, along with their positions and the desired direction of
@@ -3

[Bug-wget] [PATCH] Re: [RFE / project idea]: convert-links for "transparent proxy" mode

2015-09-23 Thread Ander Juaristi

During the last days Gabriel and I have been working on a new feature called --convert-file-only.
You can find Gabriel's original message to the mailing list and some additional replies below. I now
want to introduce you the final result :D.

We already had --convert-links, which prepended '../'-es to the links in order to keep them valid.
However, this assumes that the downloaded files would be visited locally, ie. with scheme 'file://'.

What this option does is to convert the file portion of the URL (sometimes called the 'basename')
only, but leaving the rest of the URL intact. Basically, it replaces the basename of the original
URL with the basename of the expected local file. It is specially useful in combination with
--adjust-extension, and will probably be used with it in most of the cases, although this is not
enforced. The main rationale for this was to be able to mirror a site in a web cache, such as Squid
[1]. You would just `wget -r --convert-file-only --adjust-extension blah` and expect it to work.

So if we execute:

$ wget --recursive --convert-file-only --adjust-extension ...

All the links to the remote file

//foo.com/bar.cgi?xyz

That would've been rewritten to

docroot/foo.com/bar.cgi?xyz.css

Would instead appear as

//foo.com/bar.cgi?xyz.css

This also works for CSSs, so this

background:url(//mirror.ini.cmu.edu/foo.png);

Would result in

background:url(//mirror.ini.cmu.edu/foo.png)

Instead of either

background:url(http://mirror.ini.cmu.edu/foo.png) (<-- default behavior.
Wget converts net paths)

background:url(../mirror.ini.cmu.edu/foo.png) (<-- this will clearly not
work on a server)

We had doubts on how to call this new option. I initially proposed something like '--basename-only',
but Gabriel was concerned that although in the Unix world the term 'basename' correctly refers to
the file portion of a URI (just what we want), this might not be the case in the Wget universe.

[1] http://www.squid-cache.org/

On 07/05/2015 08:29 PM, Gabriel L. Somlo wrote:

On Sun, Jul 05, 2015 at 07:34:07PM +0200, Ander Juaristi wrote:

Hi Gabriel,

So, if I understood well, you want to keep the modifications made by Wget to
the basename (such as escape the reserved characters) but not touch the
hostname production, right?

So that instead of

../../../mirror.ini.cmu.edu/cgi-bin/cssfoo.cgi%3Ftwo.html

would have to be

http://mirror.ini.cmu.edu/cgi-bin/cssfoo.cgi%3Ftwo.html

For a URI that originally was (BTW, I don't know if omitting the scheme is
correct, but anyway, that's how it was)

//mirror.ini.cmu.edu/cgi-bin/cssfoo.cgi?two

Thus, without looking at the code (either Wget's original or your proposed
changes), and from a purely algorithmic approach, the original behaviour of
Wget is something like this:

for each absolute URI found as uri
loop
convert_relative(uri)
escape(uri)
end loop

And what you want is something like this:

for each absolute URI found as uri
loop
escape(uri) // keep the URI as-is but escape the reserved
characters
end loop

Am I right?

Almost :)

Leaving out escaping the URI (which needs to happen in all cases),
construct_relative() does two things:

1. modify "basename" according to what --adjust-extension did
to the file name of the document targeted by the URI. I.e.,
foo.cgi?arg1&arg2&arg3 -> foo.cgi\?arg1\&arg2\&arg3.html
^

2. modify "dirname" to compute the correct number of "../"'s
required to back out of the directory hierarchy
representing the current web server before being ready to
enter the target web server's folder hierarchy.
That, btw, is the part which assumes one always ends up
using "file://" to view the scraped content :)

We do need #1 to still happen, so just escape-ing the URI isn't going
to be enough. When the document containing the URI in question is
being served from a transparent proxy, the client will request the
URI, which then had better match something else also available on the
transparent proxy. When --adjust-extension was used, that something
will have a different basename than what's in the link URI.

Regarding #2, we clearly don't want

//mirror.ini.cmu.edu/foo...

to be converted into

../../../mirror.ini.cmu.edu/foo...

However, we ALSO do not want it converted into

http://mirror.ini.cmu.edu/foo...

Re: [Bug-wget] --header="Accept-encoding: gzip"

2015-09-22 Thread Ander Juaristi


Hi,

would you tell us what are the exact URLs you are trying with?

If you can't post those URLs in public, either set up a test server, or send 
them to me privately.

I think what you need is the -r/--recursive option but I won't be sure unless I 
see the URLs.

Also, bear in mind that Wget does not support any kind of content-coding. This means that 
if the content comes gzipped, it won't be able to decompress it, as you would expect. So 
passing '--header="Accept-encoding: gzip"' won't probably do what you expected.

On 09/22/2015 07:57 PM, andreas wpv wrote:

All,
I am trying to download all files of a webpage - but compressed, if they
come compressed, and regular if not compressed. Get all the files the way a
browser would.

so, this works for the html file itself:

wget --header="Accept-encoding: gzip" "url"

and this for itself works to download all elements:

wget -p -H "url"

So, now I want these combined:

wget -p -H  --header="Accept-encoding: gzip" "url"

Unfortunately this only pulls the html files (because where I pull them
they are compressed), and not all the other scripts and stylesheets and so
on, although at least a few of these are compressed, either.


Ideas, tips?



Regards,
- AJ

Re: [Bug-wget] [PATCH] FTPS support

2015-09-11 Thread Ander Juaristi

Hi,

I merged your first patch on top of mine and it all works well now. It
also passes all the tests. I also removed old code that appeared in my
previous patch (in connect.c) that should _not_ go to production.

You added the support for proxy for FTPS, which I ended up forgotting.
Well done!

Tell me if you still see something broken.

Regards,
- AJ

On Fri, 2015-09-11 at 10:54 +0200, Tim Ruehsen wrote:
> 
> Hi Ander,
> 
> two things that I found.
> 
> 1. [PATCH 0002] Some FTPS pieces seem to be missing.
> I went through the code by searching for SCHEME_FTP and "ftp" and added FTPS 
> stuff where it was missing. I did not think throughly about what I did - so 
> please just don't apply those changes blindly.
> 
> With those changes, Wget tried to work recursively but hangs on the first 
> PASSIVE data transfer.
> 
> 2. [PATCH 0003 ]'using_data_security' in getftp() is a local variable, reset 
> for each file, and only set if (prot != PROT_CLEAR). I turned the logic and 
> voila, recursion works. Again, think if this change might break something 
> else 
> that I didn't test.
> 
> Regards, Tim

From e8e3033f0ed4a5ffd7ade8325c86ccd3da495fb3 Mon Sep 17 00:00:00 2001
From: Ander Juaristi 
Date: Thu, 27 Aug 2015 16:32:36 +0200
Subject: [PATCH] Added support for FTPS

 * doc/wget.texi: updated documentation to reflect the new FTPS functionality.
 * src/ftp-basic.c (ftp_greeting): new function to read the server's greeting.
   (ftp_login): greeting code was previously here. Moved to ftp_greeting to
   support FTPS implicit mode.
   (ftp_auth): wrapper around the AUTH TLS command.
   (ftp_ccc): wrapper around the CCC command.
   (ftp_pbsz): wrapper around the PBSZ command.
   (ftp_prot): wraooer around the PROT command.
 * src/ftp.c (get_ftp_greeting): new static function.
   (init_control_ssl_connection): new static function to start SSL/TLS on the
   control channel.
   (getftp): added hooks to support FTPS commands (RFCs 2228 and 4217).
   (ftp_loop_internal): test for new FTPS error codes.
 * src/ftp.h: new enum 'prot_level' with available FTPS protection levels +
   prototypes of previous functions. New flag for enum 'wget_ftp_fstatus' to track
   whether the data channel has some security mechanism enabled or not.
 * src/gnutls.c (struct wgnutls_transport_context): new field 'session_data'.
   (wgnutls_close): free GnuTLS session data before exiting.
   (ssl_connect_wget): save/resume SSL/TLS session.
 * src/http.c (establish_connection): refactor ssl_connect_wget call.
   (metalink_from_http): take into account SCHEME_FTPS as well.
 * src/init.c, src/main.c, src/options.h: new command line/wgetrc options.
   (main): in recursive downloads, check for SCHEME_FTPS as well.
 * src/openssl.c (struct openssl_transport_context): new field 'sess'.
   (ssl_connect_wget): save/resume SSL/TLS session.
 * src/retr.c (retrieve_url): check new scheme SCHEME_FTPS.
 * src/ssl.h (ssl_connect_wget): refactor. New parameter of type 'int *'.
 * src/url.c. src/url.h: new scheme SCHEME_FTPS.
 * src/wget.h: new FTPS error codes.
 * src/metalink.h: support FTPS scheme.
---
 doc/wget.texi   |  37 +
 src/ftp-basic.c | 140 ++---
 src/ftp.c   | 237 +++-
 src/ftp.h   |  21 -
 src/gnutls.c|  35 -
 src/http.c  |   5 +-
 src/init.c  |  10 +++
 src/main.c  |  25 +-
 src/metalink.h  |   2 +-
 src/openssl.c   |  16 +++-
 src/options.h   |   4 +
 src/recur.c |  15 ++--
 src/retr.c  |  15 +++-
 src/ssl.h   |   2 +-
 src/url.c   |  11 ++-
 src/url.h   |   4 +
 src/wget.h  |   3 +-
 17 files changed, 545 insertions(+), 37 deletions(-)

diff --git a/doc/wget.texi b/doc/wget.texi
index b27..f1244aa 100644
--- a/doc/wget.texi
+++ b/doc/wget.texi
@@ -2008,6 +2008,43 @@ this option has no effect.  Symbolic links are always traversed in this
 case.
 @end table
 
+@section FTPS Options
+
+@table @samp
+@item --ftps-implicit
+This option tells Wget to use FTPS implicitly. Implicit FTPS consists of initializing
+SSL/TLS from the very beginning of the control connection. This option does not send
+an @code{AUTH TLS} command: it assumes the server speaks FTPS and directly starts an
+SSL/TLS connection. If the attempt is successful, the session continues just like
+regular FTPS (@code{PBSZ} and @code{PROT} are sent, etc.).
+Implicit FTPS is no longer a requirement for FTPS implementations, and thus
+many servers may not support it. If @samp{--ftps-implicit} is passed and no explicit
+port number specified, the default port for implicit FTPS, 990, will be used, instead
+of the default port for the "normal" (explicit) FTPS which is the same as that of FTP,
+21.
+
+@item --no-ftps-resume-ssl
+Do not resume the SSL/TLS session in the data channel.

Re: [Bug-wget] [PATCH] FTPS support

2015-09-11 Thread Ander Juaristi

Yep, I just noticed it was because of using_data_security. What I did was to 
track that in ccon *con. I've got the patch ready, but I'll see if I can merge 
your changes in a while.

Sent from my smartphone. Excuse my brevity.

 Tim Ruehsen escribió 

>On Thursday 10 September 2015 16:05:14 Ander Juaristi Alamos wrote:
>> > Hi Ander,
>> > 
>> > during the last test I realized that --recursive won't work with FTPS.
>> 
>> Hi Tim,
>> 
>> If been looking through it and I've seen that the '--recursive' option from
>> FTP and the one from FTPS follow different code paths.
>> 
>> The code that triggers that difference is at main.c, line 1832:
>> 
>> if ((opt.recursive || opt.page_requisites)
>>   && (url_scheme (*t) != SCHEME_FTP || url_uses_proxy
>> (url_parsed))) {
>>   ...
>>   /* Turn opt.follow_ftp on in case of recursive FTP retrieval
>> */ if (url_scheme (*t) == SCHEME_FTP)
>> opt.follow_ftp = 1;
>>   ...
>> }
>> 
>> If I replace it to include FTPS also:
>> 
>> if ((opt.recursive || opt.page_requisites)
>>   && ((url_scheme (*t) != SCHEME_FTP && url_scheme (*t) !=
>> SCHEME_FTPS)
>>   || url_uses_proxy (url_parsed)))
>> 
>> {
>>   ...
>>   /* Turn opt.follow_ftp on in case of recursive FTP retrieval
>> */ if (url_scheme (*t) == SCHEME_FTP || url_scheme (*t) == SCHEME_FTPS)
>> opt.follow_ftp = 1;
>>   ...
>> }
>> 
>> Then they both follow the same path. The rationale behind this is that the
>> functionality for FTP should behave exactly the same for FTPS.
>> 
>> However, it hangs when downloading the files... Will look further.
>> 
>> Any thoughts?
>
>Hi Ander,
>
>two things that I found.
>
>1. [PATCH 0002] Some FTPS pieces seem to be missing.
>I went through the code by searching for SCHEME_FTP and "ftp" and added FTPS 
>stuff where it was missing. I did not think throughly about what I did - so 
>please just don't apply those changes blindly.
>
>With those changes, Wget tried to work recursively but hangs on the first 
>PASSIVE data transfer.
>
>2. [PATCH 0003 ]'using_data_security' in getftp() is a local variable, reset 
>for each file, and only set if (prot != PROT_CLEAR). I turned the logic and 
>voila, recursion works. Again, think if this change might break something else 
>that I didn't test.
>
>Regards, Tim

Re: [Bug-wget] [PATCH] FTPS support

2015-09-10 Thread Ander Juaristi Alamos

> Hi Ander,
> 
> during the last test I realized that --recursive won't work with FTPS.

Hi Tim,

If been looking through it and I've seen that the '--recursive' option from FTP 
and the one from FTPS follow different code paths.

The code that triggers that difference is at main.c, line 1832:

if ((opt.recursive || opt.page_requisites)
  && (url_scheme (*t) != SCHEME_FTP || url_uses_proxy (url_parsed)))
{
  ...
  /* Turn opt.follow_ftp on in case of recursive FTP retrieval */
  if (url_scheme (*t) == SCHEME_FTP)
opt.follow_ftp = 1;
  ...
}

If I replace it to include FTPS also:

if ((opt.recursive || opt.page_requisites)
  && ((url_scheme (*t) != SCHEME_FTP && url_scheme (*t) != 
SCHEME_FTPS)
  || url_uses_proxy (url_parsed)))
{
  ...
  /* Turn opt.follow_ftp on in case of recursive FTP retrieval */
  if (url_scheme (*t) == SCHEME_FTP || url_scheme (*t) == 
SCHEME_FTPS)
opt.follow_ftp = 1;
  ...
}

Then they both follow the same path. The rationale behind this is that the 
functionality for FTP should behave exactly the same for FTPS.

However, it hangs when downloading the files... Will look further.

Any thoughts?
 
Regards,

- AJ

Re: [Bug-wget] Feature: Disabling progress bar when wget is backgrounded

2015-09-08 Thread Ander Juaristi

I remember this same issue being raised not so much ago, which ended up in a 
wontfix. However, now that we have something to set up and run, something clear 
to test, maybe the results are acceptable :D

Sent from my smartphone. Excuse my brevity.

 Christian Neukirchen escribió 

>Hi,
>
>Sometimes I start wget, but the remote site is too slow, so I rather
>want to run it in background, however when I simply use job control
>for that, wget will keep spewing the progress bar all over my
>terminal.  I have found the SIGHUP/SIGUSR1 feature to redirect output
>to a log file, but I think the following small patch is even more
>useful, since the progress bar will simply resume when wget is
>foregrounded again (also, the final message is still printed to the
>terminal in any case):
>
>--- src/progress.c
>+++ src/progress.c
>@@ -1179,10 +1179,12 @@ create_image (struct bar_progress *bp, double 
>dl_total_time, bool done)
> static void
> display_image (char *buf)
> {
>-  bool old = log_set_save_context (false);
>-  logputs (LOG_PROGRESS, "\r");
>-  logputs (LOG_PROGRESS, buf);
>-  log_set_save_context (old);
>+  if (tcgetpgrp (fileno (stderr)) == getpid ()) {
>+bool old = log_set_save_context (false);
>+logputs (LOG_PROGRESS, "\r");
>+logputs (LOG_PROGRESS, buf);
>+log_set_save_context (old);
>+  }
> }
> 
> static void
>
>This probably needs some guards for portability to all platforms.
>Only tested on Linux 4.1 so far.
>
>Opinions?
>
>-- 
>Christian Neukirchenhttp://chneukirchen.org
>

[Bug-wget] [bug #40426] wget hangs with -r and -O -

2015-09-08 Thread Ander Juaristi

Follow-up Comment #2, bug #40426 (project wget):

This was eventually merged on 2015-08-15, commit
12bae50b28fa20f9ffa3d5b0e88f4ebf51fa6864.

Please, check whether the issue is effectively fixed, and close this.

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/

Re: [Bug-wget] [RFE / project idea]: convert-links for "transparent proxy" mode

2015-08-30 Thread Ander Juaristi

Hi,

Since no one expressed either interest or refusal to this idea (and I found myself in an unexpected
situation of having more free time than usual :D), I decided to work on it a bit, which I've been
doing during this week.

After hacking some code over your inline comments, I did several test runs over your provided test
servers (www.contrib.andrew.cmu.edu/...) and still Wget was processing net paths automatically by
prefixing the protocol ("http://";). So I thought the problem could be tackled down by just not
converting net paths ("//") into schemes (ie "http://";), when transforming the downloaded HTML/CSS
files.

Sorry if I'm still unable to see through your use case but I think it all could be solved by simply
introducing a new switch that prevents that conversion. For example:

$ wget --keep-net-paths ...

So that "//mirror.cmu.edu/..." would not be converted into "http://mirror.cmu.edu/...";. The rest of
the job (such as #1 in your previous answer) would be done by the other switches, such as
'--convert-links' itself.

You've got a broader overview than me. You think this is enough?

Thanks.

On 07/05/2015 08:29 PM, Gabriel L. Somlo wrote:

On Sun, Jul 05, 2015 at 07:34:07PM +0200, Ander Juaristi wrote:

Hi Gabriel,

So, if I understood well, you want to keep the modifications made by Wget to
the basename (such as escape the reserved characters) but not touch the
hostname production, right?

So that instead of

../../../mirror.ini.cmu.edu/cgi-bin/cssfoo.cgi%3Ftwo.html

would have to be

http://mirror.ini.cmu.edu/cgi-bin/cssfoo.cgi%3Ftwo.html

For a URI that originally was (BTW, I don't know if omitting the scheme is
correct, but anyway, that's how it was)

//mirror.ini.cmu.edu/cgi-bin/cssfoo.cgi?two

Thus, without looking at the code (either Wget's original or your proposed
changes), and from a purely algorithmic approach, the original behaviour of
Wget is something like this:

for each absolute URI found as uri
loop
convert_relative(uri)
escape(uri)
end loop

And what you want is something like this:

for each absolute URI found as uri
loop
escape(uri) // keep the URI as-is but escape the reserved
characters
end loop

Am I right?

Almost :)

Leaving out escaping the URI (which needs to happen in all cases),
construct_relative() does two things:

1. modify "basename" according to what --adjust-extension did
to the file name of the document targeted by the URI. I.e.,
foo.cgi?arg1&arg2&arg3 -> foo.cgi\?arg1\&arg2\&arg3.html
^

Regarding #2, we clearly don't want

//mirror.ini.cmu.edu/foo...

to be converted into

../../../mirror.ini.cmu.edu/foo...

However, we ALSO do not want it converted into

http://mirror.ini.cmu.edu/foo...

Why would that matter ? Leaving out the schema (i.e., "//host/path")
in a document translates into "client will use the same schema as for
the referencing document which contains the URI". So, if I'm
downloading index.html using https, then a stylesheet link inside
index.html written as "//some-other-host/foo/stylesheet.css" will also
be requested via https.

If you hardcode it to "http://some-other-host/foo/stylesheet.css";,
then when loading the referencing index.html via https, the stylesheet
will NOT load, and the document will be displayed all wrong and ugly.

So, in conclusion, we want a "construct_transparent_proxy" specific
function which converts links inside documents corresponding to what
--adjust-extension did to the actual files being referenced, but
WITHOUT touching "dirname" in any way, leaving it the way it was in
the original document.

Hope that makes sense.

Thanks,
--Gabriel

On 06/29/2015 04:03 PM, Gabriel L. Somlo wrote:

Hi,

Below is an idea for an enhancement to wget, which might be a
two-day-ish project for someo

Re: [Bug-wget] Probs downloading secure content on Cygwin/Windows 7/64

2015-08-30 Thread Ander Juaristi


On 08/28/2015 08:30 PM, L Walsh wrote:



 wget

"https://get.adobe.com/flashplayer/download/?installer=FP_18_for_Firefox_-_NPAPI&os=Windows%207&browser_type=Gecko&browser_dist=Firefox&p=mss";

--2015-08-28 11:17:19--
https://get.adobe.com/flashplayer/download/?installer=FP_18_for_Firefox_-_NPAPI&os=Windows%207&browser_type=Gecko&browser_dist=Firefox&p=mss

Resolving webproxy (webproxy)... 192.168.4.1, 192.168.3.1
Connecting to webproxy (webproxy)|192.168.4.1|:8118... connected.
ERROR: The certificate of ‘get.adobe.com’ is not trusted.
ERROR: The certificate of ‘get.adobe.com’ hasn't got a known issuer.
-
I went into my web browser (which doesn't seem to have an issue with the
cert), looked at the security for the page and exported the Security Cert chain
to a ".crt" file.
In windows, I could click on that to install the cert into Window's local
store and it was "imported successfully".

But it seems wget still doesn't know how to use the native
machines cert-store.



It works well on my Linux box, which is hardly surprising if you have a quick 
look at the GnuTLS code:

const char *ca_directory;

...

ca_directory = opt.ca_directory ? opt.ca_directory : "/etc/ssl/certs";

Looks like non-Unix-like operating systems were completely forgotten when such 
code was written.
And even if you're on something Unix-like, hard-coding the certificate store to "/etc/ssl/certs" 
seems to me far from being portable, but anyway.


However, if you look a couple of lines above, you can see:

#if GNUTLS_VERSION_MAJOR >= 3
  if (!opt.ca_directory)
ncerts = gnutls_certificate_set_x509_system_trust (credentials);
#endif

Which basically tells GnuTLS to load the system's CA files on its own. This is what you want. But 
this only happens when your available GnuTLS library version is greater or equal than 3. If it's an 
older version, you're f***ed if your system does not have a readable "/etc/ssl/certs" directory with 
CAs in it.


Thus, given your symptoms, one of the following might happen:

1. The GnuTLS library version your Wget installation is linked against is lower than 3. Thus, 
since you haven't specified '--ca-directory' (see solution below) it's trying to locate the system 
CAs in "/etc/ssl/certs", which obviously does not exist on Windows. Or maybe yes (I've never used 
Cygwin so I don't know how its file system works) but for some reason Wget can't find CAs on it.


2. Your compiled GnuTLS library version is equal or greater than 3, but there's a bug on it 
that prevents it to find system CAs on Windows.


Now, in either case, it looks like the '--ca-directory' option (that maps to the 'ca_directory' 
variable in the code) painlessly overrides everything. So my suggestion is to do some research on 
where the system CAs are located on Windows, and pass that path to '--ca-directory':


$ wget --ca-directory=C:\foo\certs ...

Let us know if that works :D


Shouldn't it be able to use the native host's cert store automatically,
or is there some extra magic words / switches I should have known to
use?

;-/

Ever since the cert checking was turned on in wget, the only way I've been
able to d/l secure stuff is to tell it to ignore the security, which seems
like it might be counter-productive.

Seems alot like the standard security problem of it making it so difficult
to use, that people simply create an alias to never check security -- which
can't be better than before when I wasn't taught to turn off security (not
that I usually do, but it seems like that's the direction I'm being "hurded"...
;-)

help?

version info:
law.Bliss> wget --version
GNU Wget 1.16.1 built on cygwin.

+digest +https +ipv6 +iri +large-file +nls +ntlm +opie -psl +ssl/gnutls

Wgetrc:
   /Users/law.Bliss/.wgetrc (user)
   /etc/wgetrc (system)
Locale:
   /usr/share/locale
Compile:
   gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/etc/wgetrc"
   -DLOCALEDIR="/usr/share/locale" -I.
   -I/usr/src/wget-1.16.1-1.x86_64/src/wget-1.16.1/src -I../lib
   -I/usr/src/wget-1.16.1-1.x86_64/src/wget-1.16.1/lib
   -I/usr/include/uuid -I/usr/include/p11-kit-1 -DHAVE_LIBGNUTLS
   -DNDEBUG -ggdb -O2 -pipe -Wimplicit-function-declaration
-fdebug-prefix-map=/usr/src/wget-1.16.1-1.x86_64/build=/usr/src/debug/wget-1.16.1-1
-fdebug-prefix-map=/usr/src/wget-1.16.1-1.x86_64/src/wget-1.16.1=/usr/src/debug/wget-1.16.1-1
Link:
   gcc -I/usr/include/uuid -I/usr/include/p11-kit-1 -DHAVE_LIBGNUTLS
   -DNDEBUG -ggdb -O2 -pipe -Wimplicit-function-declaration
-fdebug-prefix-map=/usr/src/wget-1.16.1-1.x86_64/build=/usr/src/debug/wget-1.16.1-1
-fdebug-prefix-map=/usr/src/wget-1.16.1-1.x86_64/src/wget-1.16.1=/usr/src/debug/wget-1.16.1-1
   -liconv -lintl -lpcre -luuid -lnettle -lgnutls -lz -lintl -liconv
   -lp11-kit -lgmp -lhogweed -lgmp -lnettle -ltasn1 -lp11-kit -lz -lz
   -lidn ftp-opie.o gnutls.o http-ntlm.o ../lib/libgnu.a

Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later

Re: [Bug-wget] Unit test case for parse_content_range()

2015-08-30 Thread Ander Juaristi


On 08/30/2015 05:57 PM, Darshit Shah wrote:

I've attached an updated patch with some test cases and a couple of
fixes to the parsing logic.



Hi,

The latest pull fails with the following error:

http.c: In function 'test_parse_range_header':
http.c:4933:3: error: 'for' loop initial declarations are only allowed in C99 
mode
   for (unsigned i = 0; i < countof (test_array); i++)
   ^
http.c:4933:3: note: use option -std=c99 or -std=gnu99 to compile your code

I ran './configure' without arguments (ie. default flags):

  Compiler:  gcc
  CFlags:-I/usr/include/p11-kit-1   -DHAVE_LIBGNUTLS   -DNDEBUG
  LDFlags:
  Libs:  -lgnutls   -lz
  SSL:   gnutls

Regards,
- AJ

[Bug-wget] Fwd: Re: [PATCH] FTPS support

2015-08-30 Thread Ander Juaristi



The previous patch, slightly amended.

 Forwarded Message 
Subject: Re: [Bug-wget] [PATCH] FTPS support
Date: Fri, 28 Aug 2015 17:40:13 +0200
From: Tim Ruehsen 
To: Ander Juaristi 

Hi Ander,

just not much time.

You won't need
+  if (request)
+xfree (request);

Just write
   xfree (request);

xfree() already tests for non-null.

Tim



>From fd9a23744cd9f7943e39e226319a9b6923489442 Mon Sep 17 00:00:00 2001
From: Ander Juaristi 
Date: Thu, 27 Aug 2015 16:32:36 +0200
Subject: [PATCH] Added support for FTPS

 * doc/wget.texi: updated documentation to reflect the new FTPS functionality.
 * src/connect.c, src/connect.h (fd_unregister_transport): new function to unregister a
   socket's transport map.
 * src/ftp-basic.c (ftp_greeting): new function to read the server's greeting.
   (ftp_login): greeting code was previously here. Moved to ftp_greeting to
   support FTPS implicit mode.
   (ftp_auth): wrapper around the AUTH TLS command.
   (ftp_ccc): wrapper around the CCC command.
   (ftp_pbsz): wrapper around the PBSZ command.
   (ftp_prot): wraooer around the PROT command.
 * src/ftp.c (get_ftp_greeting): new static function.
   (init_control_ssl_connection): new static function to start SSL/TLS on the
   control channel.
   (getftp): added hooks to support FTPS commands (RFCs 2228 and 4217).
   (ftp_loop_internal): test for new FTPS error codes.
 * src/ftp.h: new enum 'prot_level' with available FTPS protection levels +
   prototypes of previous functions.
 * src/gnutls.c (struct wgnutls_transport_context): new field 'session_data'.
   (wgnutls_close): free GnuTLS session data before exiting.
   (ssl_connect_wget): save/resume SSL/TLS session.
 * src/http.c (establish_connection): refactor ssl_connect_wget call.
 * src/init.c, src/main.c, src/options.h: new command line/wgetrc options.
 * src/openssl.c (struct openssl_transport_context): new field 'sess'.
   (ssl_connect_wget): save/resume SSL/TLS session.
 * src/retr.c (retrieve_url): check new scheme SCHEME_FTPS.
 * src/ssl.h (ssl_connect_wget): refactor. New parameter of type 'int *'.
 * src/url.c. src/url.h: new scheme SCHEME_FTPS.
 * src/wget.h: new FTPS error codes.
---
 doc/wget.texi   |  37 +
 src/ftp-basic.c | 140 +++---
 src/ftp.c   | 229 +++-
 src/ftp.h   |  18 +
 src/gnutls.c|  35 -
 src/http.c  |   2 +-
 src/init.c  |  10 +++
 src/main.c  |  20 +
 src/openssl.c   |  16 +++-
 src/options.h   |   4 +
 src/retr.c  |   6 +-
 src/ssl.h   |   2 +-
 src/url.c   |   7 ++
 src/url.h   |   4 +
 src/wget.h  |   3 +-
 15 files changed, 513 insertions(+), 20 deletions(-)

diff --git a/doc/wget.texi b/doc/wget.texi
index d2ff7dc..f0bc379 100644
--- a/doc/wget.texi
+++ b/doc/wget.texi
@@ -1942,6 +1942,43 @@ this option has no effect.  Symbolic links are always traversed in this
 case.
 @end table
 
+@section FTPS Options
+
+@table @samp
+@item --ftps-implicit
+This option tells Wget to use FTPS implicitly. Implicit FTPS consists of initializing
+SSL/TLS from the very beginning of the control connection. This option does not send
+an @code{AUTH TLS} command: it assumes the server speaks FTPS and directly starts an
+SSL/TLS connection. If the attempt is successful, the session continues just like
+regular FTPS (@code{PBSZ} and @code{PROT} are sent, etc.).
+Implicit FTPS is no longer a requirement for FTPS implementations, and thus
+many servers may not support it. If @samp{--ftps-implicit} is passed and no explicit
+port number specified, the default port for implicit FTPS, 990, will be used, instead
+of the default port for the "normal" (explicit) FTPS which is the same as that of FTP,
+21.
+
+@item --no-ftps-resume-ssl
+Do not resume the SSL/TLS session in the data channel. When starting a data connection,
+Wget tries to resume the SSL/TLS session previously started in the control connection.
+SSL/TLS session resumption avoids performing an entirely new handshake by reusing
+the SSL/TLS parameters of a previous session. Typically, the FTPS servers want it that way,
+so Wget does this by default. Under rare circumstances however, one might want to
+start an entirely new SSL/TLS session in every data connection.
+This is what @samp{--no-ftps-resume-ssl} is for.
+
+@item --ftps-clear-data-connection
+All the data connections will be in plain text. Only the control connection will be
+under SSL/TLS. Wget will send a @code{PROT C} command to achieve this, which must be
+approved by the server.
+
+@item --ftps-fallback-to-ftp
+Fall back to FTP if FTPS is not supported by the target server. For security reasons,
+this option is not asserted by default. The default behaviour is to exit with an error.
+If a server does not successfully reply to the initial @code{AUTH TLS} command, or in the
+case of implicit FTPS, if the initial

Re: [Bug-wget] Multi segment download

2015-08-28 Thread Ander Juaristi

Hi,

Would you point us to some potential use cases? How would a Wget user benefit 
from such a feature? One of the best regarded feature of download managers is 
the ability to resume paused downloads, and that's already supported by Wget. 
Apart from that, I can't come across any other use case. But that's me, maybe 
you have a broader overview.

Sent from my smartphone. Excuse my brevity.

 Abhilash Mhaisne escribió 

>Hey all. I am new to this mailing list.
>As far as I've used wget, it downloads a specified file as a single segment.
>Can we modify this such that wget will download a file by dividing it into
>multiple
>segments and then combining all at reciever host? Just like some
>proprietary download
>managers do? If work on such a feature is going on, I'd like to be a part
>of it.
>
>Thank you!
>Abhilash Mhaisne

[Bug-wget] [PATCH] FTPS support

2015-08-28 Thread Ander Juaristi


Hi all,

Finally, here comes the FTPS patch!

At a glance, the FTPS code triggers whenever a URL with the 'ftps://' scheme is 
entered. It works either in PASV or PORT mode, and most (all?) FTP switches 
should work seamlessly with FTPS as well.

Furthermore, this patch adds 4 new command-line/wgetrc switches to control the 
FTPS behaviour, namely '--ftps-implicit', '--[no-]ftps-resume-ssl', 
'--ftps-clear-data-connection' and '--ftps-fallback-to-ftp'. These have been 
conveniently explained in the docs, in wget.texi.

One of the most significant changes is probably the addition of a new parameter 
to the ssl_connect_wget() function. Now its signature looks like this:

bool ssl_connect_wget (int, const char *, int *);

That last 'int *' parameter is a pointer to a socket descriptor. It can be 
NULL. When a valid socket descriptor is passed, then ssl_connect_wget, instead 
of opening an entirely new SSL/TLS session, it tries to resume the existing 
SSL/TLS session that's being held over that socket. I understand maybe this was 
not the best way of implementing SSL/TLS session resumption (I encourage you to 
debate here) but supporting that functionality was paramount. Probably all the 
FTPS server implementations out there require the client to resume the SSL/TLS 
session of the control connection whenever a data channel is opened. This can 
of course be overwritten, but it's usually the default behaviour. So this had 
to be implemented, otherwise it would not work in 99% of the cases.

One last move was to add a new method ssl_disconnect_wget(). This was necessary to 
support the "CCC" (RFC 2228) command. However, a simple straightforward 
implementation would leak SSL/TLS session data. In order to avoid this leakage I had to 
do some ugly hacks in connect.c, so yes, in the end I managed to get this feature 
working. But since I didn't like the approach taken, I eventually discarded this option. 
I still feel there's a need for a ssl_disconnect_wget() function (close the underlying 
SSL/TLS session, but maintain the socket open), but Tim and I agreed it'd be better to 
leave it to wget2.

Regards,
- AJ
>From 88db0ca70c512380859e900fb0e739d7e22017e1 Mon Sep 17 00:00:00 2001
From: Ander Juaristi 
Date: Thu, 27 Aug 2015 16:32:36 +0200
Subject: [PATCH] Added support for FTPS

 * doc/wget.texi: updated documentation to reflect the new FTPS functionality.
 * src/ftp-basic.c (ftp_greeting): new function to read the server's greeting.
   (ftp_login): greeting code was previously here. Moved to ftp_greeting to
   support FTPS implicit mode.
   (ftp_auth): wrapper around the AUTH TLS command.
   (ftp_ccc): wrapper around the CCC command.
   (ftp_pbsz): wrapper around the PBSZ command.
   (ftp_prot): wraooer around the PROT command.
 * src/ftp.c (get_ftp_greeting): new static function.
   (init_control_ssl_connection): new static function to start SSL/TLS on the
   control channel.
   (getftp): added hooks to support FTPS commands (RFCs 2228 and 4217).
   (ftp_loop_internal): test for new FTPS error codes.
 * src/ftp.h: new enum 'prot_level' with available FTPS protection levels +
   prototypes of previous functions.
 * src/gnutls.c (struct wgnutls_transport_context): new field 'session_data'.
   (wgnutls_close): free GnuTLS session data before exiting.
   (ssl_connect_wget): save/resume SSL/TLS session.
 * src/http.c (establish_connection): refactor ssl_connect_wget call.
 * src/init.c, src/main.c, src/options.h: new command line/wgetrc options.
 * src/openssl.c (struct openssl_transport_context): new field 'sess'.
   (ssl_connect_wget): save/resume SSL/TLS session.
 * src/retr.c (retrieve_url): check new scheme SCHEME_FTPS.
 * src/ssl.h (ssl_connect_wget): refactor. New parameter of type 'int *'.
 * src/url.c. src/url.h: new scheme SCHEME_FTPS.
 * src/wget.h: new FTPS error codes.
---
 doc/wget.texi   |  37 +
 src/ftp-basic.c | 146 +---
 src/ftp.c   | 229 +++-
 src/ftp.h   |  18 +
 src/gnutls.c|  35 -
 src/http.c  |   2 +-
 src/init.c  |  10 +++
 src/main.c  |  20 +
 src/openssl.c   |  16 +++-
 src/options.h   |   4 +
 src/retr.c  |   6 +-
 src/ssl.h   |   2 +-
 src/url.c   |   7 ++
 src/url.h   |   4 +
 src/wget.h  |   3 +-
 15 files changed, 519 insertions(+), 20 deletions(-)

diff --git a/doc/wget.texi b/doc/wget.texi
index d2ff7dc..f0bc379 100644
--- a/doc/wget.texi
+++ b/doc/wget.texi
@@ -1942,6 +1942,43 @@ this option has no effect.  Symbolic links are always traversed in this
 case.
 @end table
 
+@section FTPS Options
+
+@table @samp
+@item --ftps-implicit
+This option tells Wget to use FTPS implicitly. Implicit FTPS consists of initializing
+SSL/TLS from the very beginning of the control connection. This op

Re: [Bug-wget] Fwd: New Defects reported by Coverity Scan for GNU Wget

2015-08-26 Thread Ander Juaristi


On 08/15/2015 12:11 PM, Darshit Shah wrote:

I just ran coverity scan against the latest git code and it came up
with a bunch of new defects. Maybe we should take a look at them when
possible?




I fixed a memory leak in the HSTS code (function 
'parse_strict_transport_security').

Regards,
- AJ
>From 5a4a45ffc34619e24b9359247fbc72eaeb0d8d74 Mon Sep 17 00:00:00 2001
From: Ander Juaristi 
Date: Wed, 26 Aug 2015 12:35:02 +0200
Subject: [PATCH] Fix resource leak.

 * src/http.c (parse_strict_transport_security): Freed memory to avoid resource leak.
   Comply with GNU coding style.
---
 src/http.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/src/http.c b/src/http.c
index 834f59d..69d87cd 100644
--- a/src/http.c
+++ b/src/http.c
@@ -1272,12 +1272,12 @@ parse_strict_transport_security (const char *header, time_t *max_age, bool *incl
 {
   /* Process the STS header. Keys should be matched case-insensitively. */
   for (; extract_param (&header, &name, &value, ';', &is_url_encoded); is_url_encoded = false)
-  {
-	if (BOUNDED_EQUAL_NO_CASE(name.b, name.e, "max-age"))
-	  c_max_age = strdupdelim (value.b, value.e);
-	else if (BOUNDED_EQUAL_NO_CASE(name.b, name.e, "includeSubDomains"))
-	  is = true;
-  }
+{
+  if (BOUNDED_EQUAL_NO_CASE (name.b, name.e, "max-age"))
+c_max_age = strdupdelim (value.b, value.e);
+  else if (BOUNDED_EQUAL_NO_CASE (name.b, name.e, "includeSubDomains"))
+is = true;
+}
 
   /* pass the parsed values over */
   if (c_max_age)
@@ -1291,10 +1291,11 @@ parse_strict_transport_security (const char *header, time_t *max_age, bool *incl
 	  if (include_subdomains)
 	*include_subdomains = is;
 
-	  DEBUGP(("Parsed Strict-Transport-Security max-age = %s, includeSubDomains = %s\n",
+	  DEBUGP (("Parsed Strict-Transport-Security max-age = %s, includeSubDomains = %s\n",
 		 c_max_age, (is ? "true" : "false")));
 
-	  success = true;
+  xfree (c_max_age);
+  success = true;
 	}
   else
 	{
-- 
1.9.1

Re: [Bug-wget] [PATCH] Update HSTS info

2015-08-18 Thread Ander Juaristi

Hi Daniel,

I've amended the previous patches based on your feedback. Well, the first patch 
only. I re-attach the second one without modifications.

On 08/10/2015 06:06 PM, Daniel Kahn Gillmor wrote:

There is no pointer or reference here to what the "correct HSTS database
format used by Wget" is.  providing a brief pointer (e.g. "look at file
doc/FOO in the wget source" or "see https://example.org/wget-hsts-db";)
would be useful.

I have now added a concise description of how the HSTS database should look 
like.

Is it possible that a user would want to decouple reading from
(respecting) an HSTS file and writing to it?

During development, neither Tim nor I ever talked about separating read and 
write. It just didn't happen.

But I guess you're right. There should be a way of separating reads and writes. 
The default behaviour should still be kept the same, but there should be a way 
of decoupling those processes, as you say. It should not be prohibitively 
expensive, and someone might benefit from such features.

I've been thinking about a couple of possibilities here, based on the examples 
you provided. Discussion starts here ;-)

* Make the HSTS database read-only. Load the HSTS entries contained there 
but do not rewrite the file. This could be governed by an extra command-line 
switch, like '--hsts-read-only'. It could be passed together with 
'--hsts-file', for example:

wget --hsts-read-only --hsts-file=~/my-fixed-hsts-file

* Do not use an HSTS database at all. HSTS would be handled internally but 
there would be no interaction with the file system. Something like 
'--no-hsts-file'.

* 

This sounds like there might still be a race condition where one
process's changes get clobbered.  can we make any atomicity guarantees
for users who might be concerned about this?

You're right. My fault not to take this into account. This could be fixed with 
flock/fcntl. I think they're both in gnulib.

These last two issues require code changes. I'll take the responsibility to fix 
them, but outside of GSoC. The first one requires a bit of consensus before 
coding anything, but the second one seems a bit more straightforward. For now, 
I attach the two previous patches. The first one (the HSTS docs) amended with 
dkg's suggestions. If there're no more complaints, I think they can be pushed, 
because Wget's behaviour has not changed yet. When we implement any of the 
ideas proposed above, we'll update the docs.

I'd normally characterize these threats as "privacy concerns", not
necessarily "security concerns".  users of wget might understand them
most closely as offering "cookie-equivalent" mechanisms for some
http/https clients.

I think that's a widespread misconception. It's true many users map HSTS to HTTP cookies, 
but IMO that's a mistake. HTTP cookies and HSTS might be similar in some aspects, but 
they're two mechanisms that were designed for different purposes. HTTP cookies bridge the 
gap between HTTP's traditional stateless nature, and the stateful needs of modern 
Internet users. On the other hand, HSTS was conceived as a workaround for some security 
threats. It's true that these threats might more specifically target privacy, but I think 
it's an error for us, GNU developers, to keep on feeding the "HSTS == cookie" 
misconception.

Regards,
- AJ
>From a62468f652cc5ba94809e55bb9b187e38b05a9c4 Mon Sep 17 00:00:00 2001
From: Ander Juaristi 
Date: Sat, 8 Aug 2015 19:40:49 +0200
Subject: [PATCH 1/2] Extra debug traces for HSTS.

 * src/main.c (load_hsts, save_hsts): added DEBUGP() calls to signal
   reads and saves of the HSTS database file.
---
 src/main.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/main.c b/src/main.c
index 61accfe..c6944d9 100644
--- a/src/main.c
+++ b/src/main.c
@@ -174,6 +174,8 @@ load_hsts (void)

   if (filename)
 {
+  DEBUGP (("Reading HSTS entries from %s\n", filename));
+
   hsts_store = hsts_store_open (filename);

   if (!hsts_store)
@@ -195,6 +197,9 @@ save_hsts (void)
 {
   char *filename = get_hsts_database ();

+  if (filename)
+DEBUGP (("Saving HSTS entries to %s\n", filename));
+
   hsts_store_save (hsts_store, filename);
   hsts_store_close (hsts_store);

-- 
1.9.1

>From 4dbe3e40e7c8dfdbcbcd6539d1c2f6b5605c7d40 Mon Sep 17 00:00:00 2001
From: Ander Juaristi 
Date: Tue, 18 Aug 2015 00:45:36 +0200
Subject: [PATCH 2/2] Updated HSTS documentation

 * doc/wget.texi: updated HSTS documentation.

   Reported-by: Daniel Kahn Gillmor 
---
 doc/wget.texi | 70 +--
 1 file changed, 68 insertions(+), 2 deletions(-)

diff --git a/doc/wget.texi b/doc/wget.texi
index d2ff7dc..e3d2b46 100

Re: [Bug-wget] URL rewriting when resource name is in a variable

2015-08-17 Thread Ander Juaristi


On 08/17/2015 05:04 AM, Murali Ramanujam wrote:

Hi


When I try to save the attached HTML page (original.html) with  wget -k -p, the 
resource path gets mangled in the rewrite process (saved.html is output of 
wget).



If I understood you well, you execute

wget -k -p http://localhost/original.html

and get a file called 'saved.html' as a result?!

What do you mean with "resource name is in a variable"? What's the exact command you executed on the 
console?




The saved page gives a SyntaxError error when opened. In addition, it looks 
like wget is trying to rewrite the src to http://localhost/loc but it should be 
http://localhost/script.js.


Script.js is just console.log(1);


Is this a valid bug? Is there any way to get expected behaviour from wget in 
such a situation?




Could you pass the flag '--debug' and post the output?




Thanks
Murali




Regards,
- AJ

Re: [Bug-wget] [PATCH] bad filenames (again)

2015-08-16 Thread Ander Juaristi


On 08/17/2015 01:31 AM, Ander Juaristi wrote:

On 08/15/2015 08:53 AM, Darshit Shah wrote:

I guess this issue is now closed? We should document libgpgme11-dev as
a dependency.



Everything works OK here. Even without libgpgme-dev.

Check out the attached patch.

Regards,
- AJ


Sorry Darshit, I had no intention of stepping over your work. I didn't notice you had already 
proposed a patch.


However, I think you should've mentioned libmetalink[1] as well.

[1] https://launchpad.net/libmetalink

Regards,
- AJ

Re: [Bug-wget] [PATCH] bad filenames (again)

2015-08-16 Thread Ander Juaristi


On 08/15/2015 08:53 AM, Darshit Shah wrote:

I guess this issue is now closed? We should document libgpgme11-dev as
a dependency.



Everything works OK here. Even without libgpgme-dev.

Check out the attached patch.

Regards,
- AJ
>From 52ad32eb0846d19508974158bd26695d5dae8cc4 Mon Sep 17 00:00:00 2001
From: Ander Juaristi 
Date: Mon, 17 Aug 2015 01:26:20 +0200
Subject: [PATCH] Documented Metalink dependencies

---
 README.checkout | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/README.checkout b/README.checkout
index 03463d1..2d5af79 100644
--- a/README.checkout
+++ b/README.checkout
@@ -94,6 +94,10 @@ Compiling From Repository Sources
saved the .pc file. Example:
$ PKG_CONFIG_PATH="." ./configure
 
+ * [46]libmetalink and [47]GPGME are required for Metalink support.
+   If any of these are found by the ./configure script, it will still
+   succeed, but Wget will be compiled without Metalink support.
+
 
For those who might be confused as to what to do once they check out
the source code, considering configure and Makefile do not yet exist at
@@ -200,3 +204,5 @@ References
   43. http://validator.w3.org/check?uri=referer
   44. http://wget.addictivecode.org/WikiLicense
   45. https://www.python.org/
+  46. https://launchpad.net/libmetalink
+  47. https://www.gnupg.org/related_software/gpgme/
-- 
1.9.1

Re: [Bug-wget] recur.c compile error

2015-08-15 Thread Ander Juaristi




On 08/15/2015 12:23 PM, Gisle Vanem wrote:


Someone please fix MailMan to make it easier to reply to this
list. Like it used to be some years ago.



Maybe this should go to the Savannah guys?

--
Regards,
- AJ

Re: [Bug-wget] [Patch] fix bug #44516, -o- log to stdout

2015-08-15 Thread Ander Juaristi




On 08/15/2015 09:30 AM, Darshit Shah wrote:

I don't think we ever merged this patch.

Any arguments against it? Else I'll go ahead and merge it in a day or two.



ACK from me.

Now that someone has dust off this, you can also merge #40426, which is 
somewhat related.

It looks like the definite solution for now is not to allow -O- and -r 
together. The two patches provided go for that same way. I would merge the 
first one proposed by Daniele Calore on Nov 11, 2013. It's more complete, in my 
opinion.


On Mon, Mar 16, 2015 at 2:47 AM, Giuseppe Scrivano  wrote:

Miquel Llobet  writes:


Yeah, I'll be using git format patch from now on. Do you prefer
pasting the commit on the message or attaching a file?


shouldn't make a difference, but just attach it, less risky to be
changed by the mail client.

Thanks,
Giuseppe







--
Regards,
- AJ

Re: [Bug-wget] [PATCH] Use u8_check() instead our own utf8 checking

2015-08-15 Thread Ander Juaristi


Hi,

On 08/15/2015 08:42 AM, Darshit Shah wrote:

I think we should go with Angel's patch. configure.ac should check for
the libidn version and decide which code to use.



I missed this could be done. But yes, if it's possible, for me this looks like 
the best option.


We should use the fixed libidn code for all updated machines, but
there will always be some ancient machines that will not update, let's
support Angel's patch on those ones.



Regards,
- AJ

[Bug-wget] [bug #45689] wget: opens a new connection for each ftp document

2015-08-15 Thread Ander Juaristi

Follow-up Comment #2, bug #45689 (project wget):

Hi Darshit,

This is not the same concept as the persistent connections in HTTP (those with
Connect: Keep-Alive).

What happens here is that if you download two documents from the same FTP
server (ie. `wget ftp://site.com/file1.txt ftp://site.com/file2.txt`) Wget
will open a connection, download the file, and close the connection again for
each of the files. This is inefficient if we take into account that both files
are stored in the same server (ie, site.com). It'd be better to open the
connection to site.com once, and send two RETRs over the same connection, and
then close the connection. It's a simple concept, in reality.

This happens because the traversal is done in recur.c, whereas the actual
connection-download-close process is done in ftp.c, and there's no
relationship between both.

This issue was among my favorites, but I won't work on it until at least one
month, so feel free to fix this, if you want :D

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/

Re: [Bug-wget] [PATCH] Update HSTS info

2015-08-15 Thread Ander Juaristi

No worries ;-)

As I told Tim off list, I'm working on the improvements.

Sent from my smartphone. Excuse my brevity.

 Darshit Shah escribió 

>Hi Ander,
>
>Could you please take a look at dkg's suggestions and recreate a patch?
>
>On Mon, Aug 10, 2015 at 9:36 PM, Daniel Kahn Gillmor
> wrote:
>> Hi Ander--
>>
>> Thanks for this!  having the concise, human-readable description of the
>> intended behavior is really useful.
>>
>> a couple comments on the proposed documentation below:
>>
>> On Sat 2015-08-08 13:50:17 -0400, Ander Juaristi wrote:
>>
>>> Those two patches add two extra debug traces for HSTS, and most 
>>> importantly, update the documentation to include information about HSTS.
>>  [...]
>>> +@item --hsts-file=@var{file}
>>> +By default, Wget stores its HSTS database in @file{~/.wget-hsts}.
>>> +You can use @samp{--hsts-file} to override this. Wget will use
>>> +the supplied file as the HSTS database. Such file must conform to the
>>> +correct HSTS database format used by Wget. If Wget cannot parse the 
>>> provided
>>> +file, the behaviour is unspecified.
>>
>> There is no pointer or reference here to what the "correct HSTS database
>> format used by Wget" is.  providing a brief pointer (e.g. "look at file
>> doc/FOO in the wget source" or "see https://example.org/wget-hsts-db";)
>> would be useful.
>
>I agree. We should supply a sample hsts config file for the users,
>similar to the wgetrc file that is available in the tarball.
>
>>
>>> +Be aware though, that Wget may modify the provided file if any change 
>>> occurs
>>> +between the HSTS policies requested by the remote servers and those in the
>>> +file. When Wget exists, it effectively updates the HSTS database by 
>>> rewriting
>>> +the database file with the new entries.
>>
>> Is it possible that a user would want to decouple reading from
>> (respecting) an HSTS file and writing to it?  some examples:
>>
>>  * A wget process might have no ability to modify the filesystem (e.g.
>>a constrained wget -O- in a pipeline), but wants to respect a
>>system-provided HSTS ruleset.  This might want to read an hsts file
>>without trying to write to it (or at least without raising an error
>>on an attempted modification).
>>
>This also allows system admins to create a hsts config file in a
>location where the current user does not have write-access.
>
>>  * A wget process used as part of network testing suite might want to
>>disobey any system-provided HSTS rules (it needs to emulate behavior
>>of "ignorant" clients), but might also want to collect the HSTS rules
>>it sees for later review.  This might want to update an hsts file
>>without trying to respect its contents.
>>
>> I don't know if these use cases warrant the additional option complexity
>> of accomodating them, but it'd be good to make that decision explicitly
>> (sorry if that's already been done and i've just missed the discussion.
>>
>>> +Care is taken not to override possible changes made by other Wget 
>>> processes at
>>> +the same time over the HSTS database. Before dumping the updated HSTS 
>>> entries
>>> +on the file, Wget will re-read it and merge the changes.
>>
>> This sounds like there might still be a race condition where one
>> process's changes get clobbered.  can we make any atomicity guarantees
>> for users who might be concerned about this?
>>
>>> +Using a custom HSTS database and/or modifying an existing one is 
>>> discouraged.
>>> +For more information about the potential security threats arised from such 
>>> practice,
>>> +see section 14 "Security Considerations" of RFC 6797, specially section 
>>> 14.9
>>> +"Creative Manipulation of HSTS Policy Store".
>>
>> I'd normally characterize these threats as "privacy concerns", not
>> necessarily "security concerns".  users of wget might understand them
>> most closely as offering "cookie-equivalent" mechanisms for some
>> http/https clients.
>>
>> sorry these questions and considerations are in text form and not as
>> patches.  feel free to take from them whatever you might find useful.
>>
>> Regards,
>>
>> --dkg
>>
>
>
>
>-- 
>Thanking You,
>Darshit Shah

[Bug-wget] [bug #33967] resuming an FTP download shows wrong progreess bar and percentage

2015-08-14 Thread Ander Juaristi

Follow-up Comment #1, bug #33967 (project wget):

This is no longer reproducible in Wget 1.15. It was fixed with commits
6ef8c69c9d34bbb750ddcd2062e668fb0d941cff and
29305e059ff2095060eef939f653c1b5deecd6d9.

This should be closed.

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/

[Bug-wget] [bug #33838] want a way to turn off verbosity but still have a progress bar

2015-08-14 Thread Ander Juaristi

Follow-up Comment #1, bug #33838 (project wget):

>From Wget 1.16 on, you have --show-progress, which does exactly what you
want.

> * Changes in Wget 1.16
>
> ** Introduce --show-progress to force display the progress bar.

I think this should be closed.

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/

[Bug-wget] [bug #23281] Consider using custom facility for password-prompt

2015-08-14 Thread Ander Juaristi

Follow-up Comment #6, bug #23281 (project wget):

> @Ander Would you like to contact upstream ? (We are not in hurry.)

Yep, I guess I could do that. Ask me for feedback if I haven't reported
anything on this for a while.

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/

Re: [Bug-wget] [PATCH] Use u8_check() instead our own utf8 checking

2015-08-14 Thread Ander Juaristi




On 07/12/2015 04:28 PM, Tim Rühsen wrote:

libidn 1.31 fixes this issue. (It's already available for Debian unstable.)

Should we simply revert the (my) patch/commit or is it worth checking for
libidn <= 1.30 in configure.ac. If the second, I would like to see Ángel's
patch going into Wget.

Any thoughts ?



I would go for the first one (apply Angel's patch).

I can confirm at the time of writing libidn has recently released version 1.32 
(at 2015-08-01). It's available at unstable, as you say. But I think it's too 
early to link against it yet.


Tim



Regards,
- AJ

Re: [Bug-wget] Truncated files ... --ignore-length bug ??

2015-08-11 Thread Ander Juaristi


Hi,

Could you run it with the --debug flag and post the output?

It looks like the MP4 file is being streamed over HTTP.

On 08/11/2015 10:03 PM, R. wrote:

Hello

I'm trying to wget a collection of (big) mp4 files.
I have every file URL available, say "http://www.site.com/path/video";.
The site uses a session cookie for authentication.
I have managed to dump this cookie to a local file and use this file when 
connecting to the server for the download process.

I'm using this command line:
wget --load-cookies cookies.txt http://www.site.com/path/somewhere/video

The download seems to proceed fine but eventually I get less than half the 
video ..
Tried many other videos with same results.

With firefox, using page information, I can see the media named "video" which 
is mp4 file. If I click save as firefox will download the whole file without a problem.

I tried curl instead of wget: same bad result.

So I tried many options from wget's man page, like --ignore-length and others, 
to no avail.

wget --ignore-length -S output is:
   HTTP/1.1 200 OK
   Date: Tue, 11 Aug 2015 19:57:59 GMT
   Server: Apache
   Vary: Host
   Expires: Sun, 19 Nov 1978 05:00:00 GMT
   Last-Modified: Tue, 11 Aug 2015 19:57:59 GMT
   Cache-Control: store, no-cache, must-revalidate
   Cache-Control: post-check=0, pre-check=0
   P3P: CP="NOI ADM DEV COM NAV OUR STP"
   X-Debug-Data: 21450
   X-Content-Duration: 502.109
   Content-Location: patch/video
   X-Bandwidth-Control: On
   X-Video-Bitrate: 28064.4
   Accept-Ranges: bytes
   Content-Length: 2925253
   Connection: close
   Content-Type: video/mp4
   Longueur: ignoré [video/mp4]

If I do wget -S without --ignore-length the last line becomes "Longueur: 2925253 
(2,8M) [video/mp4]" which is the exact length of the truncated file wget manages to 
download !!

Is wget ignoring my "--ignore-length" flag ??

Sorry for this long message and my poor english level .. I'm french and totally 
confused by this problem ..

Thanks for your help.




--
Regards,
- AJ

1 2 >

1 - 100 of 167 matches

Mail list logo