Hi,
I have some general scalability questions for Giraph. Based on the Giraph
design, I am assuming all the mappers in giraph job should be running at the
same time.
If so, then
1. The max mappers for giraph job <= total mapper slots in the whole cluster
2. The max data input size to giraph should be <= total mapper slots *
mapper memory limit
3. If the total mapper slot in the cluster is 200 and only 100 mappers is
currently available, and the giraph job require 150 mappers
* Without any configuration change, the 100 mappers of the giraph will
be started but the giraph job will NOT run successfully
* Is there any configuration in Giraph to start the job ONLY at them
time when all the mapper slot available?
4. How is the scalability in giraph? I can ONLY run up to 150 mappers for my
giraph job. Does anyone run a large giraph job in large cluster successfully?
* I am using giraph 0.1 in my cluster
Thanks a lot for your time and inputs.
Min