#674: MontySolr - batch import/upload
------------------------+------------------------------
Reporter: rchyla | Owner: rchyla
Type: defect | Status: in_merge
Priority: major | Milestone:
Component: *general* | Version:
Resolution: | Keywords: montysolr search
------------------------+------------------------------
Changes (by rchyla):
* status: new => in_merge
Comment:
This functionality is now available in:
https://github.com/romanchyla/montysolr/commit/089f30ef5cf1cbfc1c8ef86f6f3fa640c1cfb1e4
The handler is invoked as following:
http://localhost:8983/solr/invenio_update?last_recid=100&index=true&datasource=http://inspirebeta.net/search&importurl=http%3A%2F%2Flocalhost%3A8983%2Fsolr
%2Fwaiting-dataimport%3Fcommand%3Dfull-
import%26dirs%3D%2FVolumes%2Fafs_arxiv%2Fharvests-from-
amazon-s3%2C%2FVolumes%2Fafs_arxiv%2Fharvests-from-
amazon-s3-part2%2C%2FVolumes%2Fafs_arxiv%2Fharvests-from-amazon-s3-part3
importurl is the solr import handler (we simply call it and pass it some
parameters)
datasource is the url from which the DIH will fetch data; we use it to
construct the query; for example if docs 104-109,200 were updated, the
query will be http://inspirebeta.net/search?p=recid:104->109 OR
recid:200&of=xm
The import is taking advantage of the existing import mechanism which
makes it very flexible (and easy for having only one configuration).
It does not access data via python/mysql API - but using HTTP requests
(however, that also means it will be probably slower than direct mysql
access; if that is an issue, we will have to implement it differently;
extending the dataimport handlers mechanism)
--
Ticket URL: <http://invenio-software.org/ticket/674#comment:1>
Invenio <http://invenio-software.org>