Hi,
I have indexed millions of files, ending up with a 127G index file, which
works fine. There are enough resources for this.
I also tried to do the same with 10s of millions, but the indexing process
never could finish, even with enough resources (index file ~400G). It
kept updating one file a tiny bit every few minutes. I think I could do a
better job in the code, but I have not been able to get back to it yet.
-bob
On Thu, 25 Apr 2013, Edwin Crockford wrote:
Have recently built started to use Lucy (with Perl) and everything went well
until I tried to index a large file store (>300,000 files). The indexer
process reached >8Bbytes and the machine ran out of resources. My questions
are:
a) Is this the normal resources requirements?
b) Is there a way to avoid swamping machines?
I also found that the searcher becomes very large for large indexes and as
ours runs as a part of a FastCGI process it exceeded the ulimit of the
process. Upping the ulimit fixed this, but diagnosing the issue was difficult
as the query would just return 0 results rather than indicating that it had
run out of procees space.
Many thanks
Edwin Crockford
--
Dr. Robert Bruen
Cold Rain Labs
http://coldrain.net/bruen
+1.802.579.6288