'real-time'/frequent ingestion using ManifoldCF

R . Tue, 02 Jul 2019 08:22:48 -0700
Hello,
 
we are ingesting Documentum system using ManifoldCF and index those documents 
into Elasticsearch. Structure of the Documentum system is a few hundreds of 
cabinets containing together a few millions of documents. We have defined about 
80 ManifoldCF jobs and each job process some portion of the cabinets. The jobs 
are scheduled to run once a day to get new/updated content. This setup works 
pretty good for us but now we received a specific request for almost 
'real-time' ingestion and indexing.
 
The 'real-time' request does not apply to all documents, only to a small subset 
of them. They are not stored in one place but are located across all cabinets 
and we cannot identify their location in advance to create a specific job for 
them. Our idea for the solution is to call DQL query which will give us IDs of 
those documents and process only them. Call this query frequently, e.g. each 5 
minutes.
 
Can this be somehow done with ManifoldCF? Is there a way how to pass into 
ManifoldCF Documentum connector only IDs of documents I want to ingest? Would 
be a solution to inherit 'real-time' connector from Documentum connector, this 
connector would compute documents for 'real-time' ingestion using DQL query and 
ingest them. Can be job scheduled to run frequently, e.g. in 5 minutes 
intervals?
 
Thanks for any advice or suggestion,
Radko
'real-time'/frequent ingestion using ManifoldCF

Reply via email to