Re: Timeout never happens with https://www.tesco.com

2020-12-07 Thread Вячеслав Петрищев


08.12.2020 8:28, Tom Crane пишет:


I suspect you use OpenSSL for TLS and wget is missing the fcntl() 
call from

src/openssl.c.


That was it.  Checking the Slackware distro's build script 
http://slackware.osuosl.org/slackware64-current/source/n/wget/wget.SlackBuild 
shows it specifies,


Release 1.20.3 doesn't suppose fcntl() call, when ssl=openssl, it uses 
run_with_timeout().
And there was/is a bug in src/openssl.c: openssl_peek() - SSL_peek() is 
called directly wo run_with_timeout() - reason of "the hang"


In current trunk src this has been fixed:

-

./configure --with-ssl=openssl

write(2, "\n---request begin---\nGET /groceries/en-GB/products/280200433 
HTTP/1.1\r\nUser-Agent: Wget/1.20.3.206-794b7\r\nAccept: 
*/*\r\nAccept-Encoding: identity\r\nHost: www.tesco.com\r\nConnection: 
Keep-Alive\r\n\r\n---request end---\n", 211

---request begin---
GET /groceries/en-GB/products/280200433 HTTP/1.1
User-Agent: Wget/1.20.3.206-794b7
Accept: */*
Accept-Encoding: identity
Host: www.tesco.com
Connection: Keep-Alive

---request end---
) = 211
_newselect(5, NULL, [4], NULL, {tv_sec=1, tv_usec=0}) = 1 (out [4], left 
{tv_sec=0, tv_usec=94})

write(4, "\...", 194) = 194
ioctl(0, TIOCGPGRP, [20708])    = 0
getpgrp()   = 20708
write(2, "HTTP request sent, awaiting response... ", 40HTTP request 
sent, awaiting response... ) = 40

fcntl64(4, F_GETFL) = 0x2 (flags O_RDWR)
fcntl64(4, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
clock_gettime64(CLOCK_MONOTONIC, {tv_sec=1225, tv_nsec=8033473}) = 0
read(4, 0x1d70e03, 5)   = -1 EAGAIN (Resource 
temporarily unavailable)
_newselect(5, [4], NULL, NULL, {tv_sec=1, tv_usec=0}) = 1 (in [4], left 
{tv_sec=0, tv_usec=946914})

clock_gettime64(CLOCK_MONOTONIC, {tv_sec=1225, tv_nsec=61924252}) = 0
read(4, "\27\3\3\1\n", 5)   = 5
read(4, "f\...", 266) = 266
read(4, "\27\3\3\1\n", 5)   = 5
read(4, "\...", 266) = 266
read(4, 0x1d70e03, 5)   = -1 EAGAIN (Resource 
temporarily unavailable)

_newselect(5, [4], NULL, NULL, {tv_sec=0, tv_usec=946109}) = 0 (Timeout)
fcntl64(4, F_SETFL, O_RDWR) = 0
ioctl(0, TIOCGPGRP, [20708])    = 0
getpgrp()   = 20708
write(2, "Read error (Connection timed out) in headers.\n", 46Read error 
(Connection timed out) in headers.

) = 46
write(4, "\27\3\3\0\23\307\257=\26\16ha\2204\310*R\302\225'_d\205\360", 
24) = 24

close(4)    = 0
ioctl(0, TIOCGPGRP, [20708])    = 0
getpgrp()   = 20708
write(2, "Closed 4/SSL 0x01d6a310\n", 24Closed 4/SSL 0x01d6a310
) = 24
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1229, ...}) = 0
ioctl(0, TIOCGPGRP, [20708])    = 0
getpgrp()   = 20708
write(2, "Giving up.\n\n", 12Giving up.

)  = 12
close(3)    = 0
exit_group(4)   = ?
+++ exited with 4 +++

-

./configure --with-ssl=openssl CPPFLAGS="-DOPENSSL_RUN_WITHTIMEOUT"

write(2, "\n---request begin---\nGET /groceries/en-GB/products/280200433 
HTTP/1.1\r\nUser-Agent: Wget/1.20.3.206-794b7\r\nAccept: 
*/*\r\nAccept-Encoding: identity\r\nHost: www.tesco.com\r\nConnection: 
Keep-Alive\r\n\r\n---request end---\n", 211

---request begin---
GET /groceries/en-GB/products/280200433 HTTP/1.1
User-Agent: Wget/1.20.3.206-794b7
Accept: */*
Accept-Encoding: identity
Host: www.tesco.com
Connection: Keep-Alive

---request end---
) = 211
_newselect(5, NULL, [4], NULL, {tv_sec=1, tv_usec=0}) = 1 (out [4], left 
{tv_sec=0, tv_usec=93})

write(4, "\...", 194) = 194
ioctl(0, TIOCGPGRP, [10869])    = 0
getpgrp()   = 10869
write(2, "HTTP request sent, awaiting response... ", 40HTTP request 
sent, awaiting response... ) = 40

rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigaction(SIGALRM, {sa_handler=0x454ba0, sa_mask=[ALRM], 
sa_flags=SA_RESTART}, {sa_handler=SIG_DFL, sa_mask=[ALRM], 
sa_flags=SA_RESTART}, 8) = 0
setitimer(ITIMER_REAL, {it_interval={tv_sec=0, tv_usec=0}, 
it_value={tv_sec=1, tv_usec=0}}, NULL) = 0

read(4, "\27\3\3\1\n", 5)   = 5
read(4, "s\...", 266) = 266
read(4, "\27\3\3\1\n", 5)   = 5
read(4, "~\...", 266) = 266
read(4, 0x1e0edf3, 5)   = ? ERESTARTSYS (To be restarted 
if SA_RESTART is set)

--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGALRM, {sa_handler=SIG_DFL, sa_mask=[ALRM], 
sa_flags=SA_RESTART}, {sa_handler=0x454ba0, sa_mask=[ALRM], 
sa_flags=SA_RESTART}, 8) = 0

ioctl(0, TIOCGPGRP, [10869])    = 0
getpgrp()   = 10869
write(2, "Read error (Connection timed out) in headers.\n", 46Read error 
(Connection timed out) 

Re: Timeout never happens with https://www.tesco.com

2020-12-07 Thread Jeffrey Walton
On Mon, Dec 7, 2020 at 9:29 PM Tom Crane  wrote:
>
> On Mon, 7 Dec 2020, Petr Pisar wrote:
>
> > V??Mon, Dec 07, 2020 at 03:44:32AM +,??Tom Crane napsal(a):
> ...
> >
> > I suspect you use OpenSSL for TLS and wget is missing the fcntl() call from
> > src/openssl.c.
>
> That was it.  Checking the Slackware distro's build script
> http://slackware.osuosl.org/slackware64-current/source/n/wget/wget.SlackBuild
> shows it specifies,
>
> ./configure --with-ssl=openssl

There's also --with-libssl-prefix when OpenSSL is not in a standard
location. I believe the same option is needed when using an updated
GnuTLS.

(I build against OpenSSL because OpenSSL is easier to update. OpenSSL
has fewer dependencies than GnuTLS. I don't know how
--with-libssl-prefix behaves with GnuTLS).

You might need --with-libssl-prefix on a system like OS X. OS X
provides an antique version OpenSSL 0.9.x, so you usually need to
build and install OpenSSL 1.1.x.

For example:

# OS X 10.5 (PowerMac)
$ /usr/bin/openssl version
OpenSSL 0.9.7l 28 Sep 2006

# OS X 10.9
$ /usr/bin/openssl version
OpenSSL 0.9.8zg 14 July 2015

# OS X 10.12
$ /usr/bin/openssl version
OpenSSL 0.9.8zh 14 Jan 2016

Jeff



Re: Timeout never happens with https://www.tesco.com

2020-12-07 Thread Tom Crane

On Mon, 7 Dec 2020, Petr Pisar wrote:


V??Mon, Dec 07, 2020 at 03:44:32AM +,??Tom Crane napsal(a):


[analysis cut]

Many thanks for your analysis.



I suspect you use OpenSSL for TLS and wget is missing the fcntl() call from
src/openssl.c.


That was it.  Checking the Slackware distro's build script 
http://slackware.osuosl.org/slackware64-current/source/n/wget/wget.SlackBuild 
shows it specifies,


./configure --with-ssl=openssl

Changing that to '--with-ssl=gnutls' [default] fixed the problem.

Best regards
Tom.



Re: Timeout never happens with https://www.tesco.com

2020-12-07 Thread Вячеслав Петрищев


07.12.2020 9:44, Tom Crane пишет:

On Sun, 6 Dec 2020, Petr Pisar wrote:


V Sun, Dec 06, 2020 at 06:48:13PM +, Tom Crane napsal(a):
tmp$ strace -s999 -f -e trace=network wget -S -v 
--no-check-certificate --waitretry=1 --tries=1 --timeout=1 
--read-timeout=1 --wait=1 -O /tmp/280200433.tmp 
https://www.tesco.com/groceries/en-GB/products/280200433

[...]

connect(4, {sa_family=AF_INET, sin_port=htons(443),
sin_addr=inet_addr("104.82.193.249")}, 16) = 0
connected.
HTTP request sent, awaiting response...

but the timeout never happens,
Do not restrict strace to network syscalls. We cannot see where it 
blocks.

The timeout should occur in select() syscall.

-- Petr



Sure and thanks for the follow-up.  Please see,

https://www.mklab.rhul.ac.uk/~tom/tempwget/wget-strace.log

After waiting several minutes I -C out of the strace/wget session.

Out of interest, can you reproduce the hang with that URL on your 
system/wget version?


Tom.



timeout happens

..

---request begin---
GET /groceries/en-GB/products/280200433 HTTP/1.1
User-Agent: Wget/1.20.3.206-794b7
Accept: */*
Accept-Encoding: identity
Host: www.tesco.com
Connection: Keep-Alive

---request end---
) = 211
_newselect(5, NULL, [4], NULL, {tv_sec=1, tv_usec=0}) = 1 (out [4], left 
{tv_sec=0, tv_usec=94})
sendmsg(4, {msg_name=NULL, msg_namelen=0, 
msg_iov=[{iov_base="\27\3\3\0\275\16\236S\263\307\7k\274\357\371\340\273\10x\264s\336\206\335WY!5\363w\207\255X\211\230\21\342\245\254\7\265\266\277t\335\2349\204o\243\5\373eE\240\235\210\224\4<,g\256aE\327z\4\206\33\346\320\371\344\350\204\351N,z\203\264\4\\kN\205\363\324\251\366F\363s\332\272\345\355|]\274\202\35\32\210;Y\262", 
iov_len=194}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 194

ioctl(0, TIOCGPGRP, [21671])    = 0
getpgrp()   = 21671
write(2, "HTTP request sent, awaiting response... ", 40HTTP request 
sent, awaiting response... ) = 40

fcntl64(4, F_GETFL) = 0x2 (flags O_RDWR)
fcntl64(4, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
clock_gettime64(CLOCK_MONOTONIC, {tv_sec=3245, tv_nsec=533340227}) = 0
recv(4, "\27\3\3\1\n", 5, 0)    = 5
recv(4, "\272\377,?\211\246-\245\253\376\247\263\317\247\343j 
\244\nB\377\353\305\366/\354\341\204\377\243\226\354\205\272\203\316F\"\10\276N\10\22\37\312\273\335\347e\33)\276;\326\215|Ne5!\325m\326Nzf\233m\34\7O\254&6;\326\345-DE>\336\372g@\231\243\325\307\213S\321\235\351\25\376\374\10oq\37\265i\32\fS\6\2262]\34\315\201\20\353\t\3\36s\231&3>\1\211\302\276\347\255\223\211\351*z\2651Q\223\231~\247\311\22\16\332\f;h\212\277o\310\222U\7Y\274p\331lA\222\337\206\312\315K\34)\r\3734\256H\316bJ+\30o\5\370BH\252\315O\373(s\316\22xY\345\266w:\\\316$\250\37\213\214)D\33\244\251\336\0\374\207\336\261\241:\222\260\324\333%\346VlE\304\213\240\303=\2420\262\34\372\25\357\323J\272\245u\202\310-\277a]\n\256\376c;\17~@t\306|\262O\36\353X", 
266, 0) = 266

clock_gettime64(CLOCK_REALTIME, {tv_sec=1607339907, tv_nsec=865035973}) = 0
_newselect(5, [4], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
fcntl64(4, F_SETFL, O_RDWR) = 0
ioctl(0, TIOCGPGRP, [21671])    = 0
getpgrp()   = 21671
write(2, "Read error (Connection timed out) in headers.\n", 46Read error 
(Connection timed out) in headers.

) = 46
close(4)    = 0
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1229, ...}) = 0
ioctl(0, TIOCGPGRP, [21671])    = 0
getpgrp()   = 21671
write(2, "Giving up.\n\n", 12Giving up.

)  = 12
close(3)    = 0
exit_group(4)   = ?
+++ exited with 4 +++

wget -S -dv --no-check-certificate --waitretry=1 --tries=1 --timeout=1 
--read-timeout=1 --wait=1 -O ./280200433.tmp 
https://www.tesco.com/groceries/en-GB/products/280200433 
Setting --verbose (verbose) to 1
Setting --verbose (verbose) to 1
Setting --check-certificate (checkcertificate) to 0
Setting --check-certificate (checkcertificate) to 0
Setting --waitretry (waitretry) to 1
Setting --waitretry (waitretry) to 1
Setting --tries (tries) to 1
Setting --tries (tries) to 1
Setting --timeout (timeout) to 1
Setting --timeout (timeout) to 1
Setting --read-timeout (readtimeout) to 1
Setting --read-timeout (readtimeout) to 1
Setting --wait (wait) to 1
Setting --wait (wait) to 1
Setting --output-document (outputdocument) to ./280200433.tmp
Setting --output-document (outputdocument) to ./280200433.tmp
DEBUG output created by Wget 1.20.3 on linux-gnu.

Reading HSTS entries from /home/slv/.wget-hsts
URI encoding = ‘UTF-8’
--2020-12-07 17:16:33--  
https://www.tesco.com/groceries/en-GB/products/280200433
Certificates loaded: 252
Resolving www.tesco.com (www.tesco.com)... 23.54.9.112
Caching www.tesco.com => 23.54.9.112
Connecting to www.tesco.com (www.tesco.com)|23.54.9.112|:443... connected.
Created socket 4.
Releasing 0x02050ba0 (new refcount 1).

---request begin---
GET 

Re: Timeout never happens with https://www.tesco.com

2020-12-06 Thread Tom Crane

On Sun, 6 Dec 2020, Petr Pisar wrote:


V Sun, Dec 06, 2020 at 06:48:13PM +, Tom Crane napsal(a):

tmp$ strace -s999 -f -e trace=network wget -S -v --no-check-certificate 
--waitretry=1 --tries=1 --timeout=1 --read-timeout=1 --wait=1 -O 
/tmp/280200433.tmp https://www.tesco.com/groceries/en-GB/products/280200433

[...]

connect(4, {sa_family=AF_INET, sin_port=htons(443),
sin_addr=inet_addr("104.82.193.249")}, 16) = 0
connected.
HTTP request sent, awaiting response...

but the timeout never happens,

Do not restrict strace to network syscalls. We cannot see where it blocks.
The timeout should occur in select() syscall.

-- Petr



Sure and thanks for the follow-up.  Please see,

https://www.mklab.rhul.ac.uk/~tom/tempwget/wget-strace.log

After waiting several minutes I -C out of the strace/wget session.

Out of interest, can you reproduce the hang with that URL on your 
system/wget version?


Tom.


Re: Timeout never happens with https://www.tesco.com

2020-12-06 Thread Petr Pisar
V Sun, Dec 06, 2020 at 06:48:13PM +, Tom Crane napsal(a):
> tmp$ strace -s999 -f -e trace=network wget -S -v --no-check-certificate 
> --waitretry=1 --tries=1 --timeout=1 --read-timeout=1 --wait=1 -O 
> /tmp/280200433.tmp https://www.tesco.com/groceries/en-GB/products/280200433
[...]
> connect(4, {sa_family=AF_INET, sin_port=htons(443), 
> sin_addr=inet_addr("104.82.193.249")}, 16) = 0
> connected.
> HTTP request sent, awaiting response...
> 
> but the timeout never happens,
Do not restrict strace to network syscalls. We cannot see where it blocks.
The timeout should occur in select() syscall.

-- Petr


signature.asc
Description: PGP signature


Timeout never happens with https://www.tesco.com

2020-12-06 Thread Tom Crane

I am using the following command in a script (shown here running under strace),

tmp$ strace -s999 -f -e trace=network wget -S -v --no-check-certificate 
--waitretry=1 --tries=1 --timeout=1 --read-timeout=1 --wait=1 -O 
/tmp/280200433.tmp https://www.tesco.com/groceries/en-GB/products/280200433
--2020-12-06 17:52:25--  
https://www.tesco.com/groceries/en-GB/products/280200433
Resolving www.tesco.com (www.tesco.com)... socket(AF_UNIX, 
SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 4
connect(4, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 
ENOENT (No such file or directory)
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 4
connect(4, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 
ENOENT (No such file or directory)
socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 4
setsockopt(4, SOL_IP, IP_RECVERR, [1], 4) = 0
connect(4, {sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("192.168.1.254")}, 16) = 0
sendmmsg(4, [{msg_hdr={msg_name=NULL, msg_namelen=0, 
msg_iov=[{iov_base="\357\317\1\0\0\1\0\0\0\0\0\0\3www\5tesco\3com\0\0\1\0\1", 
iov_len=31}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, msg_len=31}, {msg_hdr={msg_name=NULL, 
msg_namelen=0, 
msg_iov=[{iov_base="\376\303\1\0\0\1\0\0\0\0\0\0\3www\5tesco\3com\0\0\34\0\1", 
iov_len=31}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, msg_len=31}], 2, MSG_NOSIGNAL) = 2
recvfrom(4, 
"\357\317\201\0\0\1\0\1\0\0\0\0\3www\5tesco\3com\0\0\1\0\1\300\f\0\1\0\1\0\0\1,\0\4hR\301\371",
 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.1.254")}, 
[28->16]) = 47
recvfrom(4, "\376\303\201\2\0\1\0\0\0\0\0\0\3www\5tesco\3com\0\0\34\0\1", 65536, 0, 
{sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.1.254")}, [28->16]) = 
31104.82.193.249
Connecting to www.tesco.com (www.tesco.com)|104.82.193.249|:443... 
socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4
connect(4, {sa_family=AF_INET, sin_port=htons(443), 
sin_addr=inet_addr("104.82.193.249")}, 16) = 0

connected.
HTTP request sent, awaiting response...

but the timeout never happens, it just hangs indefinitely with the TCP
connection remaining in the ESTABLISHED state according to netstat.

I am wondering if the problem is something to do with the initial 
part of the SSL negotiation since it happens very early on after the 
connection is established, and checking old messages in this list, there 
were bug reports about this a few years back.


Accessing the same URL from a web browser works fine, rendering the 
content OK.  This may of course involve client-side scripting but that is 
another matter.  I am assuming wget should be able to read and store 
whatever HTML is produced and not just hang.


I know I could run wget under the Gnutils timeout command as a 
workaround but I would prefer to get wget's timeout to work.


Any thoughts?

Thanks
Tom Crane

System/wget details:
OS: Slackware64-current
Kernel: 5.4.6

$ wget --version
GNU Wget 1.20.3 built on linux-gnu.

-cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls
+ntlm +opie +psl +ssl/openssl

Wgetrc:
/home/tom/.wgetrc (user)
/etc/wgetrc (system)
Locale:
/usr/share/locale
Compile:
gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/etc/wgetrc"
-DLOCALEDIR="/usr/share/locale" -I. -I../lib -I../lib -DHAVE_LIBSSL
-DNDEBUG -O2 -fPIC
Link:
gcc -DHAVE_LIBSSL -DNDEBUG -O2 -fPIC -lpcre2-8 -luuid -lidn2 -lssl
-lcrypto -lz -lpsl ftp-opie.o openssl.o http-ntlm.o ../lib/libgnu.a
-lunistring

$ cat ~/.wgetrc
robots = off
user-agent = Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 
Firefox/60.0