Hello,
we are ingesting Documentum system using ManifoldCF and index those documents
into Elasticsearch. Structure of the Documentum system is a few hundreds of
cabinets containing together a few millions of documents. We have defined about
80 ManifoldCF jobs and each job process some portion of the cabinets. The jobs
are scheduled to run once a day to get new/updated content. This setup works
pretty good for us but now we received a specific request for almost
'real-time' ingestion and indexing.
The 'real-time' request does not apply to all documents, only to a small subset
of them. They are not stored in one place but are located across all cabinets
and we cannot identify their location in advance to create a specific job for
them. Our idea for the solution is to call DQL query which will give us IDs of
those documents and process only them. Call this query frequently, e.g. each 5
minutes.
Can this be somehow done with ManifoldCF? Is there a way how to pass into
ManifoldCF Documentum connector only IDs of documents I want to ingest? Would
be a solution to inherit 'real-time' connector from Documentum connector, this
connector would compute documents for 'real-time' ingestion using DQL query and
ingest them. Can be job scheduled to run frequently, e.g. in 5 minutes
intervals?
Thanks for any advice or suggestion,
Radko