Re: [Invenio] #674: MontySolr - batch import/upload

Invenio Trac Fri, 17 Jun 2011 17:01:24 +0200

#674: MontySolr - batch import/upload
------------------------+------------------------------
  Reporter:  rchyla     |      Owner:  rchyla
      Type:  defect     |     Status:  in_merge
  Priority:  major      |  Milestone:
 Component:  *general*  |    Version:
Resolution:             |   Keywords:  montysolr search
------------------------+------------------------------
Changes (by rchyla):


 * status:  new => in_merge


Comment:

 This functionality is now available in:
 
https://github.com/romanchyla/montysolr/commit/089f30ef5cf1cbfc1c8ef86f6f3fa640c1cfb1e4

 The handler is invoked as following:

 
http://localhost:8983/solr/invenio_update?last_recid=100&index=true&datasource=http://inspirebeta.net/search&importurl=http%3A%2F%2Flocalhost%3A8983%2Fsolr
 %2Fwaiting-dataimport%3Fcommand%3Dfull-
 import%26dirs%3D%2FVolumes%2Fafs_arxiv%2Fharvests-from-
 amazon-s3%2C%2FVolumes%2Fafs_arxiv%2Fharvests-from-
 amazon-s3-part2%2C%2FVolumes%2Fafs_arxiv%2Fharvests-from-amazon-s3-part3

 importurl is the solr import handler (we simply call it and pass it some
 parameters)
 datasource is the url from which the DIH will fetch data; we use it to
 construct the query; for example if docs 104-109,200 were updated, the
 query will be http://inspirebeta.net/search?p=recid:104->109 OR
 recid:200&of=xm


 The import is taking advantage of the existing import mechanism which
 makes it very flexible (and easy for having only one configuration).

 It does not access data via python/mysql API - but using HTTP requests
 (however, that also means it will be probably slower than direct mysql
 access; if that is an issue, we will have to implement it differently;
 extending the dataimport handlers mechanism)

-- 
Ticket URL: <http://invenio-software.org/ticket/674#comment:1>
Invenio <http://invenio-software.org>

Re: [Invenio] #674: MontySolr - batch import/upload

Reply via email to