Hello all.   I am using ManifoldCF to index a Windows share containing well 
over 160,000 files (.xls, .pdf, .doc).   I keep getting memory errors when I 
try to index the whole folder at once and have not been able to resolve this by 
throwing memory and CPU at Tomcat and the VM, so I thought I'd try this a 
different way.
 
What I'd like to do now is break what was a single job up into multiple jobs.   
Each job should index all indexable files under a parent folder, with one job 
indexing folders whose names begin with the letters A-G as well as all 
subfolders and files within, another job for H-M also with all 
subfolders/files, and so on.   My problem is, somehow I can't manage to figure 
out what expression to use to get it to index what I want.    
 
In the Job settings under Paths, I have specified the parent folder, and within 
there I've tried:
 
1.  Include file(s) or directory(s) matching *  (this works, but indexes every 
file in every folder within the parent, eventually causing me unresolvable GC 
memory overhead errors)
2.  Include file(s) or directory(s) matching ^(?i)[A-G]*  (this does not work; 
it supposedly indexes one file and then quits)
3.  Include file(s) or directory(s) matching A* (this does not work; it 
supposedly indexes one file and then quits, and there are many folders directly 
under the parent that begin with 'A')
 
Can anyone help confirm what type of expression I should use in the paths to 
accomplish what I want? 
 
Or alternately if you think I should be able to index 160,000+ files in one job 
without getting GC memory overhead errors, I'm open to hear your suggestions on 
resolving those.   All I know to do is increase the maximum memory in Tomcat as 
well as on the OS, and that didn't help at all.  
 
Thanks much!
 
-Ian

Reply via email to