On 8/31/2015 11:26 AM, Troy Edwards wrote:
> I am having a hard time finding documentation on DataImportHandler 
> scheduling in SolrCloud. Can someone please post a link to that? I 
> have a requirement that the DIH should be initiated at a specific time 
> Monday through Friday.

Troy, is your question how to use scheduled tasks?   Shawn pointed you to the 
right direction.   I thought it more likely that you want to schedule a cron 
task to run on any of your servers running SolrCloud, and you want the job to 
run even if the cluster is degraded.   

Here's an idea - schedule your job Monday on node 1, Tuesday on node 2, etc.   
That way, if the cluster is degraded (a node is down), re-indexing/delta 
indexing still happens, it just happens slower.    You can certainly write a 
zookeeper client to make each cron job compete to see who does the job - 
questions on how to do this should be directed to a zookeeper users' mailing 
list.

-----Original Message-----
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Monday, August 31, 2015 7:50 PM
To: solr-user@lucene.apache.org
Subject: Re: DataImportHandler scheduling

On 8/31/2015 11:26 AM, Troy Edwards wrote:
> I am having a hard time finding documentation on DataImportHandler 
> scheduling in SolrCloud. Can someone please post a link to that? I 
> have a requirement that the DIH should be initiated at a specific time 
> Monday through Friday.

Every modern operating system (and most of the previous versions of every 
modern OS) has a built-in task scheduling system.  For Windows, it's literally 
called Task Scheduler.  For most other operating systems, it's called cron.

Including dataimport scheduling capability in Solr has been discussed, and I 
think someone even wrote a working version ... but since every OS already has 
scheduling capability that has had years of time to mature, why should Solr 
reinvent the wheel and take the risk that the implementation will have bugs?

Currently virtually all updates to Solr's index must be initiated outside of 
Solr, and there is good reason to make sure that Solr doesn't ever modify the 
index without outside input.  The only thing I know of right now that can 
update the index automatically is Document Expiration, but the expiration time 
is decided when the document is indexed, and the original indexing action is 
external to Solr.

https://lucidworks.com/blog/document-expiration/

Thanks,
Shawn

Reply via email to