#997: Uniformize Invenio user-agent string
--------------------+------------------------
Reporter: skaplun | Owner:
Type: task | Status: new
Priority: minor | Component: *general*
Version: | Keywords: user-agent
--------------------+------------------------
There are several pieces of code in Invenio, were Invenio act as a client
in contacting an external webserver. Each time the ''User-Agent'' string
is set in a particular way.
{{{
config/invenio.conf: ('http(s)?://.*', {'User-Agent': 'Invenio'}),
modules/bibclassify/lib/bibclassify_text_extractor.py:
request.add_header("User-Agent", user_agent)
modules/bibharvest/lib/oai_harvest_getter.py: "User-
Agent":"Invenio %s" % CFG_VERSION}
modules/bibsword/lib/bibsword_client_http.py: headers['User-Agent']
= CFG_DEFAULT_USER_AGENT
modules/bibsword/lib/bibsword_client_http.py: headers['User-Agent']
= CFG_DEFAULT_USER_AGENT
modules/bibsword/lib/bibsword_client_http.py: request.add_header
('User-Agent', CFG_DEFAULT_USER_AGENT)
modules/bibupload/lib/bibupload.py: request.add_header('User-Agent',
'')
modules/elmsubmit/lib/elmsubmit_tests_1.mbox:User-Agent: Mutt/1.5.9i
modules/elmsubmit/lib/elmsubmit_tests_2.mbox:User-Agent: Mutt/1.5.9i
modules/miscutil/lib/errorlib.py: 'browser': 'User-Agent' in
req.headers_in and \
modules/miscutil/lib/errorlib.py: req.headers_in
['User-Agent'] or "N/A",
modules/miscutil/lib/invenio_connector.py: opener.addheaders =
[('User-agent', 'invenio_webupload')]
modules/miscutil/lib/inveniocfg.py: Header append Vary
User-Agent env=!dont-vary
modules/miscutil/lib/mailutils.py: msg_root['User-Agent'] = 'Invenio %s
at %s' % (CFG_VERSION, CFG_SITE_URL)
modules/websearch/lib/websearch_external_collections_getter.py:
"User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-us)
AppleWebKit/48 (like Gecko) Safari/48\r\n" + \
modules/websession/lib/webuser.py: user_info['agent'] =
req.headers_in.get('User-Agent', 'N/A')
modules/webstyle/lib/webinterface_handler.py:
os.environ["HTTP_USER_AGENT"] = req.headers_in.get('User-Agent', '')
modules/webstyle/lib/webinterface_handler.py: g =
_RE_BAD_MSIE.search(req.headers_in.get('User-Agent', "MSIE 6.0"))
modules/webstyle/lib/webstyle_templates.py: if
req.headers_in.has_key('User-Agent'):
modules/webstyle/lib/webstyle_templates.py: browser_s
+= ': ' + req.headers_in['User-Agent']
modules/websubmit/lib/bibdocfile.py: g =
_RE_BAD_MSIE.search(headers.get('user-agent', "MSIE 6.0"))
}}}
This usage might be made more consistent. E.g. what about taking
inspiration from:
https://developer.mozilla.org/en/Gecko_user_agent_string_reference and
http://www.user-agents.org/
and have something as in:
{{{#!python
"Invenio %s (+%s; \"%s\") %s" % (CFG_VERSION, CFG_SITE_URL, CFG_SITE_NAME,
component)
}}}
E.g. for a request done on my machine from BibUpload would become:
{{{
Invenio 1.0.0.951-c6f69-dirty (+http://pcsk4.cern.ch; "Atlantis Institute
of Sam") BibUpload
}}}
This code could be put in ''urlutils'' in a function called e.g.
{{{#!python
def make_user_agent_string(component=None):
pass
}}}
--
Ticket URL: <http://invenio-software.org/ticket/997>
Invenio <http://invenio-software.org>