[issue22450] urllib doesn't put Accept: */* in the headers

2016-09-09 Thread Senthil Kumaran

Senthil Kumaran added the comment:

@Martin, I weight in 'curl's behavior for de-facto things that differ slightly 
from standards. It's simply what folks have gotten used to, and sometimes 
expect.

@Raymond, unit-tests will be a good addition too.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22450] urllib doesn't put Accept: */* in the headers

2016-09-09 Thread Raymond Hettinger

Changes by Raymond Hettinger :


--
resolution:  -> fixed
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22450] urllib doesn't put Accept: */* in the headers

2016-09-09 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 00da8bfa2a60 by Raymond Hettinger in branch '3.5':
Issue #22450: Use "Accept: */*" in the default headers for urllib.request
https://hg.python.org/cpython/rev/00da8bfa2a60

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22450] urllib doesn't put Accept: */* in the headers

2016-09-09 Thread Kenneth Reitz

Kenneth Reitz added the comment:

I fully second Corey's comment.

--
nosy: +kennethreitz

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22450] urllib doesn't put Accept: */* in the headers

2016-09-09 Thread Roundup Robot

Roundup Robot added the comment:

New changeset e84105b48436 by Raymond Hettinger in branch '2.7':
Issue #22450: Use "Accept: */*" in the default headers for urllib
https://hg.python.org/cpython/rev/e84105b48436

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22450] urllib doesn't put Accept: */* in the headers

2016-09-09 Thread Martin Panter

Martin Panter added the comment:

I’m still not convinced. But my argument about the user specifying Accept if 
they care about the media type works both ways, so I am not that fussed if 
others want to make the change.

Are there any examples of servers that behave worse than the application/json 
vs text/json example? E.g. returning XML vs JSON or something?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22450] urllib doesn't put Accept: */* in the headers

2016-09-09 Thread Cory Benfield

Cory Benfield added the comment:

So, leaping in on the Requests side of things for a moment, two notes. Firstly: 
copying curl is rarely a bad thing to do, especially for a behaviour curl has 
had for a long time.

However, in this case the stronger argument is that just because the RFCs say 
that Accept: */* is implied doesn't mean it can safely be omitted. In practice, 
origin servers behave unexpectedly when the header is omitted, and in general 
behave more predictable when it is emitted. For that reason, it should be added 
by Python's standard library. 

HTTP/1.1 is a protocol where "as deployed" means much more than "as specified", 
sadly.

--
nosy: +Lukasa

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22450] urllib doesn't put Accept: */* in the headers

2016-08-30 Thread Martin Panter

Martin Panter added the comment:

“Proxy servers such as NGinx and Varnish: . . . if the Accept header is 
omitted, the proxy cache can return any of the cached responses.”

This is not really my area of expertise, but this behaviour is inconsistent 
with my understanding of how Accept and Vary are supposed to work in general. I 
would expect a cache to treat a missing Accept field as a separate “value” that 
does not match any specific Accept value.

See . Also, 
what about a server that sets “Vary: Cookie”, to send a response that depends 
on whether the user has already seen the page. Do these NGinx and Varnish 
caches respond with a random response if Cookie is missing?

I still think if you care about the media type, it is better practice to 
specify what types you want with a more explicit Accept value. And if you don’t 
care about the media type, the NGinx/Varnish behaviour may not be a problem 
anyway.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22450] urllib doesn't put Accept: */* in the headers

2016-08-30 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Putting it another way:   To an origin server, 'Accept: */*' means it can 
return anything it wants.  To a proxy server, the absence of an accept header 
means in can return anything it has cached (possibly different from what the 
origin server would have returned).  In contract, to a proxy server, 'Accept: 
*/*' means return exactly what the origin server would have returned with the 
same headers.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22450] urllib doesn't put Accept: */* in the headers

2016-08-30 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Update:  After more research, I learned that while 'Accept: */*' should not 
have an effect on the origin webserver, it can and does have an effect on proxy 
servers.

Origin servers are allowed to vary the content-type of responses when given 
different Accept headers.  When they do so, they should also send "Vary: 
Accept".   

Proxy servers such as NGinx and Varnish respond to the "Vary: Accept" by 
caching the different responses using a combination of url and the accept 
header as the cache key.  If the request has 'Accept: */*', then the cache 
lookup returns the same result as if the 'Accept: */*' had been passed directly 
to the server.  However, if the Accept header is omitted, the proxy cache can 
return any of the cached responses (typically the most recent, regardless of 
content-type).

Accordingly, it is a good practice to include 'Accept: */*' in the request so 
that you get a consistent result (what the server would have returned) rather 
than the inconsistent and unpredictable content-types you would receive in the 
absence of the Accept header.  I believe that is why the other tools and book 
examples use 'Accept: */*' even though the origin wouldn't care.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22450] urllib doesn't put Accept: */* in the headers

2015-11-02 Thread Martin Panter

Martin Panter added the comment:

The Curl programmer replied basically saying there was no scientific reason, 
but since Curl was previously sending a custom Accept header, it was safer to 
leave a bare-bones Accept header in than completely remove it. Plus he thought 
it might be slighly more compatible with websites.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22450] urllib doesn't put Accept: */* in the headers

2015-11-01 Thread Raymond Hettinger

Raymond Hettinger added the comment:

> What do you think, Raymond

Before dismissing this, we should get a better understanding of why "Accept: 
*/*" is so widely used in practice.

Here's what we know so far:
* The header made a difference to the Facebook Graph API.
* Curl (a minimalist) includes "Accept: */*", Host, and User-Agent.
* Firefox includes "*/*" at the end of its list of acceptable types.
* Kenneth Reitz's requests module uses "Accept: */*" by default.
* The poolmanager in urllib3 uses "Accept: */*" by default and has a comment 
that that and the "Host" header are both needed by proxies.
* I'm also seeing "Accept: */*" in book examples as well.  See 
https://books.google.com/books?id=fVuWayXLdYIC=PA22 and 
http://doc.bonfire-project.eu/R1/api/example-session.html

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22450] urllib doesn't put Accept: */* in the headers

2015-11-01 Thread Martin Panter

Martin Panter added the comment:

According to all the HTTP 1.1 RFCs, having */* at the end means you accept any 
other content type if none of the higher priority ones are available (otherwise 
you risk a 406 Not Acceptable error). So that explains why Firefox has */* 
tacked on.

Requests copied from Curl: 
. Similarly, it is in 
urllib3 “because that’s what cURL had by default”. Brief discussion at 
, where they 
decided to leave things as they already were.

So all roads seem to lead to Curl. Curl’s “initial revision” (Dec 1999) had 
“Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*”, which was 
changed to “Accept: */*” in  in 
2004. I don’t see any reasons given. I just left a question on Git Hub about 
this, so maybe we might get some sort of answer.

Wget also includes “Accept: */*”. But it gives no explanations either, and it 
was present right from the “initial revision” also in Dec 1999 (presumably 
Source Forge started about then).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22450] urllib doesn't put Accept: */* in the headers

2015-10-31 Thread Martin Panter

Martin Panter added the comment:

I propose rejecting this one, in favour of the caller adding their own “Accept: 
*/*” (or more preferably, “Accept: application/json”) header. What do you 
think, Raymond or Senthil?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22450] urllib doesn't put Accept: */* in the headers

2015-04-11 Thread Martin Panter

Martin Panter added the comment:

The RFC https://tools.ietf.org/html/rfc7231#page-39 says “A request without 
any Accept header field implies that the user agent will accept any media type 
in response”, which sounds the same as “Accept: */*”. I don’t understand why 
adding it should make a real difference.

If you really desire only application/json, you should probably include 
“Accept: application/json” in the request. Otherwise, it would probably be more 
robust to make your program accept both types. I have come across the same deal 
with application/atom+xml vs text/xml vs application/xml.

--
nosy: +vadmium

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22450
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22450] urllib doesn't put Accept: */* in the headers

2014-09-21 Thread Arfrever Frehtes Taifersar Arahesis

Changes by Arfrever Frehtes Taifersar Arahesis arfrever@gmail.com:


--
nosy: +Arfrever

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22450
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22450] urllib doesn't put Accept: */* in the headers

2014-09-20 Thread Raymond Hettinger

New submission from Raymond Hettinger:

The use of urllib for REST APIs is impaired in the absence of a Accept: */* 
header such as that added automatically by the requests package or by the CURL 
command-line tool.



# Example that gets an incorrect result due to the missing header
import urllib
print 
urllib.urlopen('http://graph.facebook.com/raymondh').headers['Content-Type']

# Equivalent call using CURL
$ curl -v http://graph.facebook.com/raymondh
...
* Connected to graph.facebook.com (31.13.75.1) port 80 (#0)
 GET /raymondh HTTP/1.1
 User-Agent: curl/7.30.0
 Host: graph.facebook.com
 Accept: */*


--
files: accept.diff
keywords: patch
messages: 227194
nosy: rhettinger
priority: normal
severity: normal
stage: patch review
status: open
title: urllib doesn't put Accept: */* in the headers
type: behavior
versions: Python 2.7
Added file: http://bugs.python.org/file36673/accept.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22450
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22450] urllib doesn't put Accept: */* in the headers

2014-09-20 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Can you explain how the result is incorrect?

 f = urllib.request.urlopen('http://graph.facebook.com/raymondh')
 json.loads(f.read().decode())
{'link': 'https://www.facebook.com/raymondh', 'id': '562805507', 'last_name': 
'Hettinger', 'gender': 'male', 'first_name': 'Raymond', 'name': 'Raymond 
Hettinger', 'locale': 'en_US', 'username': 'raymondh'}

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22450
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22450] urllib doesn't put Accept: */* in the headers

2014-09-20 Thread Senthil Kumaran

Senthil Kumaran added the comment:

Patch looks good. Will need similar addition in urllib2 and inclusion of tests.

--
nosy: +orsenthil
versions: +Python 3.4, Python 3.5

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22450
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22450] urllib doesn't put Accept: */* in the headers

2014-09-20 Thread Senthil Kumaran

Senthil Kumaran added the comment:

Well, the result with loading using json will be same. but without sending 
Accept */*. The content-type returned is text/javascript; charset=UTF-8 and 
with sending of Accept */* the content-type is set to application/json; 
charset=UTF-8 (which is more desirable).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22450
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22450] urllib doesn't put Accept: */* in the headers

2014-09-20 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 The content-type returned is text/javascript; charset=UTF-8 and with
 sending of Accept */* the content-type is set to application/json;
 charset=UTF-8 (which is more desirable).

Is that a bug in urllib, or in Facebook's HTTP implementation?
Frankly, we shouldn't jump to conclusions just because one specific use case is 
made better by this. Forcing an accept header may totally change the output of 
other servers and break existing uses.

(and besides, the content-type header is unimportant when you know what to 
expect, which is normally the case when calling an API)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22450
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com