[issue34777] urllib.request accepts anything as a header parameter for some URLs

Jose Gama Thu, 27 Sep 2018 13:36:35 -0700

Jose Gama <[email protected]> added the comment:

Thank you for the quick reply. You are correct about the difficulties of using 
a universally accepted list.This is one example that generates errors on the 
server side. Just for the record.


#!/usr/bin/env python3
from urllib.request import Request, urlopenfrom urllib.error import URLError
# process SSB dataurl1 = 
'https://raw.githubusercontent.com/mapnik/test-data/master/csv/points.csv'url2 
= 
'https://gitlab.cncf.ci/kubernetes/kubernetes/raw/c69582dffba33e9f1c08ff2fc67924ea90f1448c/test/test_owners.csv'url3
 = 
'http://data.ssb.no/api/klass/v1/classifications/131/changes?from=2016-01-01&to=9999-12-31'headers1
 = {'Accept': 'text/csv'}headers2 = {'Akcept': 'text/csv'}headers3 = {'Accept': 
'tekst/cxv'}headers4 = {'Accept': '1234'}req = Request(url3, 
headers=headers4)resp = urlopen(req)content =  
resp.read().decode(resp.headers.get_content_charset()) # get the character 
encoding from the server responseprint(content)
'''req = Request(url3, headers=headers3)
urllib.error.HTTPError: HTTP Error 500: Internal Server Error

req = Request(url3, headers=headers4)
urllib.error.HTTPError: HTTP Error 406: Not Acceptable'''

    On Tuesday, September 25, 2018, 8:38:26 AM GMT+2, Karthikeyan Singaravelan 
<[email protected]> wrote:  

Karthikeyan Singaravelan <[email protected]> added the comment:

Thanks for the report. I tried similar requests and it works this way for other 
tools like curl since Akcept could be a custom header in some use cases though 
it could be a  typo in this context. There is no predefined set of media types 
that we need to validate as far as I can see from 
https://tools.ietf.org/html/rfc2616#section-14.1 and it depends on the server 
configuration to do validation. It's hard for Python to maintain a list of 
acceptable MIME types for validation across releases. A list of registered MIME 
types that is updated periodically : 
https://www.iana.org/assignments/media-types/media-types.xhtml and RFC for 
registration : https://tools.ietf.org/html/rfc6838

Some sample requests from curl with invalid headers.

curl -X GET https://httpbin.org/get -H 'Authorization: Token 
bc23f14356c114a8ffa319773583426878b7b37f' -H 'Cache-Control: no-cache' -H 
'Content-Type: application/json' -H 'Akcept: tekst/csv'
{
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Akcept": "tekst/csv",
    "Authorization": "Token bc23f14356c114a8ffa319773583426878b7b37f",
    "Cache-Control": "no-cache",
    "Connection": "close",
    "Content-Type": "application/json",
    "Host": "httpbin.org",
    "User-Agent": "curl/7.37.1"
  },
  "origin": "182.73.135.26",
  "url": "https://httpbin.org/get";
}

curl -X GET https://httpbin.org/get -H 'Authorization: Token 
bc23f14356c114a8ffa319773583426878b7b37f' -H 'Cache-Control: no-cache' -H 
'Content-Type: application/json' -H 'Accept: tekst'
{
  "args": {},
  "headers": {
    "Accept": "tekst",
    "Authorization": "Token bc23f14356c114a8ffa319773583426878b7b37f",
    "Cache-Control": "no-cache",
    "Connection": "close",
    "Content-Type": "application/json",
    "Host": "httpbin.org",
    "User-Agent": "curl/7.37.1"
  },
  "origin": "182.73.135.26",
  "url": "https://httpbin.org/get";
}

Feel free to add in if I am missing something here but I think it's hard for 
Python to maintain the updated list and adding warning/error might break 
someone's code.

Thanks

----------
nosy: +xtreak

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue34777>
_______________________________________

----------

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue34777>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue34777] urllib.request accepts anything as a header parameter for some URLs

Reply via email to