Incidentally, we had a little chat with Marcus yesterday about that.

No, i don't think it is feasible to use single image to store everything.
It is convenient, cheap and of course it is way better than dealing
with communicating with external DB/servers whatever.

But there's one thing you should know already: the days of vertical
growth is over.

Running a service (under VM or not) on a single machine is asking for troubles:
 - limits on load
 - susceptible to power outage and other reliability problems
 etc

Also, think that the amount of data you need to process correlates
with CPU horsepower available.
Which means that yes, you can run a huge image with 64Gb data in it..
but that means that responsiveness
of your service will quite often fall beyond any usability limits.

If we look in terms of VM and pick only one thing - garbage collection,
you will see that there is certain limits beyond which a performance
will drop too much, so you naturally
will start thinking about ways to split data to separate chunks and
run them on different machines/VMs.

It is because GC's mark algorithm is O(n) bound, when n is total
number of references between objects,
and GC's scavenge algorithm is at best O(n) bound where n is total
number of objects in object memory,
and at worst is where n is total memory used by objects.
No matter how you turn it, i just wanted to indicate that time to run
GC is in linear dependency from the amount of data.

Yes, we might invest a lot of effort in making GC more clever, more
complex and more robust.. but no matter what you do,
you cannot change the above facts. It means, that any improvements
will be about diminishing returns, but won't change the picture
radically.

That means that sooner or later you will have to deal with it: a
problem of splitting data on multiple independent chunks,
and making your service to run on multiple machines , in order to use
more CPU power, more memory and be more reliable etc.
At this point, your main dilemma is to invent a fast and robust
interfaces to communicate between images or between image(s)/ database
etc.

We should concentrate on things which dealing with inter-image
communication and image-database communication,
because it is the only way to ensure that we will answer upcoming
future problems. Relying on using a single huge image is way to
nowhere.

-- 
Best regards,
Igor Stasenko.

Reply via email to