[issue15991] BaseHTTPServer with ThreadingMixIn serving wrong data sometimes

2012-10-10 Thread Charles-François Natali

Charles-François Natali added the comment:

According to http://tools.ietf.org/html/rfc2616#section-8.1,
persistent connections are default in HTTP 1.1.
However, here's what's said about the persistence negotiation:

   An HTTP/1.1 server MAY assume that a HTTP/1.1 client intends to
   maintain a persistent connection unless a Connection header including
   the connection-token close was sent in the request. If the server
   chooses to close the connection immediately after sending the
   response, it SHOULD send a Connection header including the
   connection-token close.

   An HTTP/1.1 client MAY expect a connection to remain open, but would
   decide to keep it open based on whether the response from a server
   contains a Connection header with the connection-token close. In case
   the client does not want to maintain a connection for more than that
   request, it SHOULD send a Connection header including the
   connection-token close.

   If either the client or the server sends the close token in the
   Connection header, that request becomes the last one for the
   connection.


And if you have a look at the strace output (or a tcpdump capture),
you see that httpserver indeeds sends a Connection: close header,
which wget ignores:

select(6, [5], NULL, NULL, {900, 0})= 1 (in [5], left {900, 0})
recvfrom(5, Connection: close\r\n, 390, MSG_PEEK, NULL, NULL) = 19
read(5, Connection: close\r\n, 19)= 19
select(6, [5], NULL, NULL, {900, 0})= 1 (in [5], left {900, 0})
recvfrom(5, \r\n, 371, MSG_PEEK, NULL, NULL) = 2
read(5, \r\n, 2)  = 2
select(6, [5], NULL, NULL, {0, 1})  = 0 (Timeout)
select(6, NULL, [5], NULL, {900, 0})= 1 (out [5], left {900, 0})
write(5, GET /FOUND-8154 HTTP/1.1\r\nUser-A..., 124) = 124
select(6, [5], NULL, NULL, {900, 0})= 1 (in [5], left {900, 0})
recvfrom(5, head\ntitleError response/ti..., 511, MSG_PEEK,
NULL, NULL) = 190
stat(FOUND-8154, 0x7fffcc530d60)  = -1 ENOENT (No such file or directory)
open(FOUND-8154, O_WRONLY|O_CREAT|O_TRUNC, 0666) = 7
clock_gettime(CLOCK_MONOTONIC, {130079, 24215187}) = 0
select(6, [5], NULL, NULL, {900, 0})= 1 (in [5], left {900, 0})
read(5, head\ntitleError response/ti..., 8192) = 190
clock_gettime(CLOCK_MONOTONIC, {130079, 24258187}) = 0
fstat(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x2b0004bbc000
write(7, head\ntitleError response/ti..., 190) = 190
write(4, \n 0K..., 777) = 777
select(6, [5], NULL, NULL, {900, 0})= 1 (in [5], left {900, 0})
read(5, 0x1becc8f0, 8192)   = -1 ECONNRESET (Connection
reset by peer)
close(5)= 0


So it really looks like a wget bug.
I'd suggest closing as invalid, and reporting this to wget developers.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15991
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15991] BaseHTTPServer with ThreadingMixIn serving wrong data sometimes

2012-10-10 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Furthermore, BaseHTTPServer responds with a HTTP/1.0 status line (not 
HTTP/1.1), meaning persistent connections are not the default.

Therefore, I'm closing as invalid.

--
resolution:  - invalid
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15991
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15991] BaseHTTPServer with ThreadingMixIn serving wrong data sometimes

2012-10-09 Thread Charles-François Natali

Charles-François Natali added the comment:

wget trace:

Here a 404 is treated correctly:

socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 5
connect(5, {sa_family=AF_INET, sin_port=htons(4242),
sin_addr=inet_addr(127.0.0.1)}, 16) = 0
select(6, NULL, [5], NULL, {900, 0})= 1 (out [5], left {900, 0})
write(5, GET /MISSING-8276 HTTP/1.1\r\nUser..., 126) = 126
select(6, [5], NULL, NULL, {900, 0})= 1 (in [5], left {899,
999000})
recvfrom(5, HTTP/1.0 404 Not Found\r\n, 511, MSG_PEEK, NULL, NULL) = 24
read(5, HTTP/1.0 404 Not Found\r\n, 24) = 24
select(6, [5], NULL, NULL, {900, 0})= 1 (in [5], left {900, 0})
recvfrom(5, Server: BaseHTTP/0.3 Python/2.4, 487, MSG_PEEK,
NULL, NULL) = 35
read(5, Server: BaseHTTP/0.3 Python/2.4, 35) = 35
select(6, [5], NULL, NULL, {900, 0})= 1 (in [5], left {900, 0})
recvfrom(5, Date: Tue, 09 Oct 2012 22:09:59 ..., 452, MSG_PEEK,
NULL, NULL) = 37
read(5, Date: Tue, 09 Oct 2012 22:09:59 ..., 37) = 37
select(6, [5], NULL, NULL, {900, 0})= 1 (in [5], left {900, 0})
recvfrom(5, Content-Type: text/html\r\n, 415, MSG_PEEK, NULL, NULL) = 25
read(5, Content-Type: text/html\r\n, 25) = 25
select(6, [5], NULL, NULL, {900, 0})= 1 (in [5], left {900, 0})
recvfrom(5, Connection: close\r\n, 390, MSG_PEEK, NULL, NULL) = 19
read(5, Connection: close\r\n, 19)= 19
select(6, [5], NULL, NULL, {900, 0})= 1 (in [5], left {900, 0})
recvfrom(5, \r\n, 371, MSG_PEEK, NULL, NULL) = 2
read(5, \r\n, 2)  = 2
select(6, [5], NULL, NULL, {0, 1})  = 1 (in [5], left {0, 1})
close(5)= 0


And here a 404 is treated incorrectly:

socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 5
connect(5, {sa_family=AF_INET, sin_port=htons(4242),
sin_addr=inet_addr(127.0.0.1)}, 16) = 0
select(6, NULL, [5], NULL, {900, 0})= 1 (out [5], left {900, 0})
write(5, GET /MISSING-8153 HTTP/1.1\r\nUser..., 126) = 126
select(6, [5], NULL, NULL, {900, 0})= 1 (in [5], left {899, 998000})
recvfrom(5, HTTP/1.0 404 Not Found\r\n, 511, MSG_PEEK, NULL, NULL) = 24
read(5, HTTP/1.0 404 Not Found\r\n, 24) = 24
select(6, [5], NULL, NULL, {900, 0})= 1 (in [5], left {900, 0})
recvfrom(5, Server: BaseHTTP/0.3 Python/2.4, 487, MSG_PEEK,
NULL, NULL) = 35
read(5, Server: BaseHTTP/0.3 Python/2.4, 35) = 35
select(6, [5], NULL, NULL, {900, 0})= 1 (in [5], left {900, 0})
recvfrom(5, Date: Tue, 09 Oct 2012 22:10:01 ..., 452, MSG_PEEK,
NULL, NULL) = 37
read(5, Date: Tue, 09 Oct 2012 22:10:01 ..., 37) = 37
select(6, [5], NULL, NULL, {900, 0})= 1 (in [5], left {900, 0})
recvfrom(5, Content-Type: text/html\r\n, 415, MSG_PEEK, NULL, NULL) = 25
read(5, Content-Type: text/html\r\n, 25) = 25
select(6, [5], NULL, NULL, {900, 0})= 1 (in [5], left {900, 0})
recvfrom(5, Connection: close\r\n, 390, MSG_PEEK, NULL, NULL) = 19
read(5, Connection: close\r\n, 19)= 19
select(6, [5], NULL, NULL, {900, 0})= 1 (in [5], left {900, 0})
recvfrom(5, \r\n, 371, MSG_PEEK, NULL, NULL) = 2
read(5, \r\n, 2)  = 2
HERE= select(6, [5], NULL, NULL, {0, 1})  = 0 (Timeout)
select(6, NULL, [5], NULL, {900, 0})= 1 (out [5], left {900, 0})
write(5, GET /FOUND-8154 HTTP/1.1\r\nUser-A..., 124) = 124
select(6, [5], NULL, NULL, {900, 0})= 1 (in [5], left {900, 0})
recvfrom(5, head\ntitleError response/ti..., 511, MSG_PEEK,
NULL, NULL) = 190
stat(FOUND-8154, 0x7fffcc530d60)  = -1 ENOENT (No such file or directory)
open(FOUND-8154, O_WRONLY|O_CREAT|O_TRUNC, 0666) = 7
clock_gettime(CLOCK_MONOTONIC, {130079, 24215187}) = 0
select(6, [5], NULL, NULL, {900, 0})= 1 (in [5], left {900, 0})
read(5, head\ntitleError response/ti..., 8192) = 190
clock_gettime(CLOCK_MONOTONIC, {130079, 24258187}) = 0
fstat(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x2b0004bbc000
write(7, head\ntitleError response/ti..., 190) = 190
write(4, \n 0K..., 777) = 777
select(6, [5], NULL, NULL, {900, 0})= 1 (in [5], left {900, 0})
read(5, 0x1becc8f0, 8192)   = -1 ECONNRESET (Connection
reset by peer)
close(5)= 0


You can se that after 404 and Connection: close is returned, wget
calls select() with a 1us timeout, to check if the socket has been
closed by the other end.
That's /src/connect.c:test_socket_open() in wget source code.
This is used by persistent_available_p() in src/http.c, with this comment:

Finally, check whether the connection is still open.  This is
 important because most servers implement liberal (short) timeout
 on persistent connections.  Wget can of course always reconnect
 if the connection doesn't work out, but it's nicer to know in
 advance.


So it deliberately reuses the same connection (so-called persistent
connection) if the http server doesn't close it right away. This
explains probably why you see this mainly when many threads are in
use.
I'm not an http expert, so I'm not sure whether 

[issue15991] BaseHTTPServer with ThreadingMixIn serving wrong data sometimes

2012-10-08 Thread Mikhail Afanasyev

Mikhail Afanasyev added the comment:

The bug is not only wget-specific. It was discovered while making APT proxy, so 
at least Debian APT fetcher has the same problem.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15991
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15991] BaseHTTPServer with ThreadingMixIn serving wrong data sometimes

2012-10-08 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 The bug is not only wget-specific. It was discovered while making APT
 proxy, so at least Debian APT fetcher has the same problem.

Well, perhaps it uses wget too?

If you really think this is a bug in Python, you have to diagnose it further 
(for example by using Wireshark and analyzing the bytes exchanged between 
server and client). I've done all the debugging I can.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15991
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15991] BaseHTTPServer with ThreadingMixIn serving wrong data sometimes

2012-10-08 Thread Charles-François Natali

Charles-François Natali added the comment:

 The bug is not only wget-specific. It was discovered while making APT proxy, 
 so at least Debian APT fetcher has the same problem.

Debian uses wget.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15991
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15991] BaseHTTPServer with ThreadingMixIn serving wrong data sometimes

2012-10-08 Thread Charles-François Natali

Charles-François Natali added the comment:

 The bug is not only wget-specific. It was discovered while making APT proxy, 
 so at least Debian APT fetcher has the same problem.

 Debian uses wget.

I meant apt-get uses wget.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15991
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15991] BaseHTTPServer with ThreadingMixIn serving wrong data sometimes

2012-09-23 Thread Thomas Lee

Thomas Lee added the comment:

I can reproduce this on Debian Wheezy. Both with the system Python (2.7.3rc2) 
and the 2.7 branch built from source.

--
nosy: +thomaslee

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15991
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15991] BaseHTTPServer with ThreadingMixIn serving wrong data sometimes

2012-09-23 Thread Antoine Pitrou

Antoine Pitrou added the comment:

I can reproduce under Debian testing, but not Debian stable nor Mageia 1.

I think this is really a bug in recent Debian/Ubuntu wget. By instrumenting 
SocketServer, I see that less requests are received than expected (e.g. ~350 
instead of 400). It seems wget is trying to send further requests on the same 
client socket, even though the server responded with a HTTP/1.0 status line and 
without a connection: keep-alive header, indicating it doesn't want 
keep-alive. Even explicitly sending a connection: close header doesn't seem 
to change wget's behaviour.

If you pass --no-keep-alive to wget, though, it solves the issue.

Running Wireshark would probably help confirm this.

--
nosy: +neologix, pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15991
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15991] BaseHTTPServer with ThreadingMixIn serving wrong data sometimes

2012-09-23 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Note that forcing a content length on error responses also seems to make wget 
happy:


diff --git a/Lib/BaseHTTPServer.py b/Lib/BaseHTTPServer.py
--- a/Lib/BaseHTTPServer.py
+++ b/Lib/BaseHTTPServer.py
@@ -362,14 +362,19 @@
 message = short
 explain = long
 self.log_error(code %d, message %s, code, message)
-# using _quote_html to prevent Cross Site Scripting attacks (see bug 
#1100201)
-content = (self.error_message_format %
-   {'code': code, 'message': _quote_html(message), 'explain': 
explain})
+if self.command != 'HEAD' and code = 200 and code not in (204, 304):
+# using _quote_html to prevent Cross Site Scripting attacks (see 
bug #1100201)
+content = (self.error_message_format %
+   {'code': code, 'message': _quote_html(message), 
'explain': explain})
+else:
+content = None
 self.send_response(code, message)
 self.send_header(Content-Type, self.error_content_type)
+if content is not None:
+self.send_header(Content-Length, str(len(content)))
 self.send_header('Connection', 'close')
 self.end_headers()
-if self.command != 'HEAD' and code = 200 and code not in (204, 304):
+if content is not None:
 self.wfile.write(content)
 
 error_message_format = DEFAULT_ERROR_MESSAGE

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15991
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15991] BaseHTTPServer with ThreadingMixIn serving wrong data sometimes

2012-09-21 Thread Catalin Iacob

Catalin Iacob added the comment:

Can't reproduce this on openSUSE Factory (development version, what will become 
12.3).

I tried with openSUSE's Python 2 which is 2.7.3 and with self compiled up to 
date 2.7 branch from hg. For each of these I bumped the number of files to 3000 
instead of 300 and ran the script around 10 times, always got 3000 good 0 bad.

--
nosy: +catalin.iacob

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15991
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15991] BaseHTTPServer with ThreadingMixIn serving wrong data sometimes

2012-09-20 Thread Mikhail Afanasyev

New submission from Mikhail Afanasyev:

When using BaseHTTPServer with ThreadingMixIn, sometimes the wrong data is 
served. 

The attached script requests normal URL and URLs which are not found in 
multiple threads using wget (which only saves pages if it gets 200 OK status).
Every once in a while responses are incorrect.
 - sometimes normal URL gets a '404 not found' content with '200 OK' status.
 - sometimes, a '404 not found' page gets '200 OK' status code

The problems become much, much less frequent if ThreadingMixIn is removed.

Python 2.7.3 on ubuntu 12.04 x86_64.

--
components: Library (Lib)
files: http_server_bug.py
messages: 170840
nosy: theamk
priority: normal
severity: normal
status: open
title: BaseHTTPServer with ThreadingMixIn serving wrong data sometimes
versions: Python 2.7
Added file: http://bugs.python.org/file27237/http_server_bug.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15991
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com