Hi Team, The postgresql 9.2, solr 5.3.2 and manifoldcf 2.7.1 are installed on the same linux box. The documentum server sits on a different linux box. The indexing performance is slow(approx 1000 doc per hour) with the documentum crawler. The used properties files is as below for reference
<configuration> <!-- Version string for UI --> <!-- Point to a specific (common) logging file --> <property name="org.apache.manifoldcf.logconfigfile" value="./logging.ini"/> <!-- Specify the connectors to be loaded --> <property name="org.apache.manifoldcf.connectorsconfigurationfile" value="../connectors.xml"/> <!-- Specify the path to the file resources directory --> <property name="org.apache.manifoldcf.fileresources" value="../file-resources"/> <property name="org.apache.manifoldcf.databaseimplementationclass" value="org.apache.manifoldcf.core.database.DBInterfacePostgreSQL"/> <property name="org.apache.manifoldcf.postgresql.hostname" value="localhost"/> <property name="org.apache.manifoldcf.postgresql.port" value="5432"/> <property name="org.apache.manifoldcf.dbsuperusername" value="postgres"/> <property name="org.apache.manifoldcf.dbsuperuserpassword" value=""/> <property name="org.apache.manifoldcf.database.name" value="manifoldcf"/> <property name="org.apache.manifoldcf.database.username" value="postgres"/> <property name="org.apache.manifoldcf.database.password" value=""/> <property name="org.apache.manifoldcf.database.maxhandles" value="100"/> <property name="org.apache.manifoldcf.crawler.threads" value="15"/> <property name="org.apache.manifoldcf.crawler.repository.store_history" value="false"/> <property name="org.apache.manifoldcf.zookeeper.connectstring" value="***********:8349"/> <property name="org.apache.manifoldcf.zookeeper.sessiontimeout" value="5000"/> <!-- Tell MCF where to find the connector jars --> <libdir path="../connector-lib"/> <libdir path="../connector-common-lib"/> <libdir path="../connector-lib-proprietary"/> <!-- Any additional local properties go here --> </configuration> Initially the org.apache.manifoldcf.crawler.threads is setup with 45 and the observation is it taking a long time gap between each batch of 45 documents during processing. Can you please point out any changes/recommendations that will speed up the indexing. Regards, Tamizh Kumaran Thamizharasan
