#252: BibDocFile: Allow using custom HTTP headers to retrieve URLs
--------------------------+-------------------------------------------------
  Reporter:  bthiell      |       Owner:  bthiell 
      Type:  enhancement  |      Status:  assigned
  Priority:  major        |   Milestone:          
 Component:  WebSubmit    |     Version:          
Resolution:               |    Keywords:  headers 
--------------------------+-------------------------------------------------

Comment (by simko):

 1) BibDocFile is mostly seen as an internal-file-manipulating library,
 so while this functionality would be indeed useful, we should better
 delimit it name-wise.  By "accessing" external files, do you mean (i)
 accessing for uploading or (ii) accessing for indexing?

  * In the former case, the external file getting library is presented
    as part of the upload process, hence BibUpload.  The variable would
    be living alongside or merged with
    CFG_BIBUPLOAD_FFT_ALLOWED_LOCAL_PATHS.

  * In the latter case, the external file indexing library is presented
    as part of the indexing process, hence BibIndex.  The variable
    would be living alongside or replace
    CFG_BIBINDEX_FULLTEXT_INDEX_LOCAL_FILES_ONLY.

 So, depending on the needs, we could invent possibly two variables of
 the kind you propose, since they may serve two independent purposes.

 2) Such a dictionary could instruct Invenio which external files to
 upload and which not, by presence/absence of an ending catch-all
 wildcard stance:

 {{{
 CFG_BIBUPLOAD_FFT_ALLOWED_EXTERNAL_URLS = {
    'http://myurl.com/*': {'User-Agent': 'Me'},
    'http://yoururl.com/*': '{'User-Agent': 'You'},
    'http://*': {'User-Agent': 'invenio-crawler'},
 ]
 }}}

 which would permit replacing some of the existing CFG variables
 mentioned above.

 (BTW this is kind of similar to, but more complete than,
 CFG_BIBINDEX_PERFORM_OCR_ON_DOCNAMES.)

-- 
Ticket URL: <http://invenio-software.org/ticket/252#comment:3>
Invenio <http://invenio-software.org>

Reply via email to