Hi,
~1.5Gb is not enough for ~4mln index?
it's more than enough! There must be a different problem.
Stefan Groschupf> As far I know 2GB is the maximum
Stefan Groschupf> manageable memory for java in 32 bit Stefan Groschupf> os systems, isn't it?
It is bad... Somebody tried to use 64 bit JVM and nutch? Which problems can arise? Nutch will work on 64 bit JVM?
I think, the 2 GB limit is no nutch, but a jvm problem, so perhaps I can share some experiences I made the last months. I had to run a selfwritten loadtest software under java and smashed some several times against the heap-limit of the jvw.
Linux (Gentoo, amd64, 4 GB RAM)
- Blackdown-jdk 1.4.2.01 -> max 2GB lmit. You can configure more, but it will crash
- Sun-JDK 1.4.2.06 (32bit-Version, no 64 bit version available!!!) -> ran more stable, but (not suprisingly) the same resul. 2GB max
Sun (Solaris 8, 4 GB-RAM)
- Sun-JDK 1.4.2.06 (inkl. 64 bit patch) -> no problems at all to use all the 4GB
I would expect, that nutch runs in the same limits.
But does this really matter? As far as I understand the design, I would try to get many small servers and distribute the load over them. This is still cheaper than one big server and the main constraint I see is the needed diskspace. My smallest (and oldest) system is a PIII-800 with 384 MB RAM. It successfull managed to collect an index with about 8 Mio pages. Then the disk was full :-) Ok, the frontend is seperated and I have to be patient when updatedb runs, but it works stable and fine ... :-)
regards
Michael
