#997: Uniformize Invenio user-agent string
--------------------+------------------------
Reporter:  skaplun  |      Owner:
    Type:  task     |     Status:  new
Priority:  minor    |  Component:  *general*
 Version:           |   Keywords:  user-agent
--------------------+------------------------
 There are several pieces of code in Invenio, were Invenio act as a client
 in contacting an external webserver. Each time the ''User-Agent'' string
 is set in a particular way.

 {{{
 config/invenio.conf:    ('http(s)?://.*', {'User-Agent': 'Invenio'}),
 modules/bibclassify/lib/bibclassify_text_extractor.py:
 request.add_header("User-Agent", user_agent)
 modules/bibharvest/lib/oai_harvest_getter.py:               "User-
 Agent":"Invenio %s" % CFG_VERSION}
 modules/bibsword/lib/bibsword_client_http.py:        headers['User-Agent']
 = CFG_DEFAULT_USER_AGENT
 modules/bibsword/lib/bibsword_client_http.py:        headers['User-Agent']
 = CFG_DEFAULT_USER_AGENT
 modules/bibsword/lib/bibsword_client_http.py:        request.add_header
 ('User-Agent', CFG_DEFAULT_USER_AGENT)
 modules/bibupload/lib/bibupload.py:    request.add_header('User-Agent',
 '')
 modules/elmsubmit/lib/elmsubmit_tests_1.mbox:User-Agent: Mutt/1.5.9i
 modules/elmsubmit/lib/elmsubmit_tests_2.mbox:User-Agent: Mutt/1.5.9i
 modules/miscutil/lib/errorlib.py:            'browser': 'User-Agent' in
 req.headers_in and \
 modules/miscutil/lib/errorlib.py:                          req.headers_in
 ['User-Agent'] or "N/A",
 modules/miscutil/lib/invenio_connector.py:            opener.addheaders =
 [('User-agent', 'invenio_webupload')]
 modules/miscutil/lib/inveniocfg.py:                Header append Vary
 User-Agent env=!dont-vary
 modules/miscutil/lib/mailutils.py:    msg_root['User-Agent'] = 'Invenio %s
 at %s' % (CFG_VERSION, CFG_SITE_URL)
 modules/websearch/lib/websearch_external_collections_getter.py:
 "User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-us)
 AppleWebKit/48 (like Gecko) Safari/48\r\n" + \
 modules/websession/lib/webuser.py:            user_info['agent'] =
 req.headers_in.get('User-Agent', 'N/A')
 modules/webstyle/lib/webinterface_handler.py:
 os.environ["HTTP_USER_AGENT"] = req.headers_in.get('User-Agent', '')
 modules/webstyle/lib/webinterface_handler.py:                g =
 _RE_BAD_MSIE.search(req.headers_in.get('User-Agent', "MSIE 6.0"))
 modules/webstyle/lib/webstyle_templates.py:                if
 req.headers_in.has_key('User-Agent'):
 modules/webstyle/lib/webstyle_templates.py:                    browser_s
 += ': ' + req.headers_in['User-Agent']
 modules/websubmit/lib/bibdocfile.py:    g =
 _RE_BAD_MSIE.search(headers.get('user-agent', "MSIE 6.0"))
 }}}

 This usage might be made more consistent. E.g. what about taking
 inspiration from:
 https://developer.mozilla.org/en/Gecko_user_agent_string_reference and
 http://www.user-agents.org/

 and have something as in:
 {{{#!python
 "Invenio %s (+%s; \"%s\") %s" % (CFG_VERSION, CFG_SITE_URL, CFG_SITE_NAME,
 component)
 }}}

 E.g. for a request done on my machine from BibUpload would become:
 {{{
 Invenio 1.0.0.951-c6f69-dirty (+http://pcsk4.cern.ch; "Atlantis Institute
 of Sam") BibUpload
 }}}

 This code could be put in ''urlutils'' in a function called e.g.

 {{{#!python
 def make_user_agent_string(component=None):
     pass
 }}}

-- 
Ticket URL: <http://invenio-software.org/ticket/997>
Invenio <http://invenio-software.org>

Reply via email to