2012/2/29 Janko Mivšek <[email protected]>: > Good compromise is a step by step approach, something like: > > 1.step: image based persistence up to 1GB with hourly snapshot > 2.step: parts migrated to Fuel and file based persistence > 3.step: Gemstone, with "images" running in parallel (well, any DB > with images in parallel, Gemstone is certainly the easiest to > scale from image based persistence) > > 1GB limit is here just for simplicity, you can probably go further with > 64bit images. > yes you can. but then you will definitely need different GC strategy or change GC.. and this immediately turns advantages, like - very very simple into - very very hard because hacking GC is not that simple :)
> Advantages: > > - very very simple start > - freedom of pure OO modeling, > - good enough for probably 90% of all projects > - fastest way from your dreams to reality > - speed of development - no impedance mismatch, no ORM nightmare > - you won't believe how much data you can put in 1GB image > - speed because of always in-memory data processing > - you can always scale further if you make your design from the start > with above steps in mind > - reliability good enough, on reliable hardware probably even better > than more complex solutions. Main reason: simplicity. > > Disadvantages: > > - you easy forgot to include later scalability requirements in > upfront design > - such scaling is easy only to OO database while migrating to > NoSQL (not to mention SQL) database later is very hard if not > possible > - up to about 1GB only, because of GC problems as Igor described > - active users limit (number of requests/s) > - single point of failure > - corrupted image will loose all data (but good backup approach helps) > - undetected image corruption fear (after many otherwise successful > snapshots, causing non-startable image) > - lengthy snapshots of bigger images (can be improved with two step > snapshots, first in memory, then on disk) > - loss of data between snapshots in case of power or machine failure > (but this is very rare those days) > > Bet regards > Janko > > S, Igor Stasenko piše: >> Incidentally, we had a little chat with Marcus yesterday about that. >> >> No, i don't think it is feasible to use single image to store everything. >> It is convenient, cheap and of course it is way better than dealing >> with communicating with external DB/servers whatever. >> >> But there's one thing you should know already: the days of vertical >> growth is over. >> >> Running a service (under VM or not) on a single machine is asking for >> troubles: >> - limits on load >> - susceptible to power outage and other reliability problems >> etc >> >> Also, think that the amount of data you need to process correlates >> with CPU horsepower available. >> Which means that yes, you can run a huge image with 64Gb data in it.. >> but that means that responsiveness >> of your service will quite often fall beyond any usability limits. >> >> If we look in terms of VM and pick only one thing - garbage collection, >> you will see that there is certain limits beyond which a performance >> will drop too much, so you naturally >> will start thinking about ways to split data to separate chunks and >> run them on different machines/VMs. >> >> It is because GC's mark algorithm is O(n) bound, when n is total >> number of references between objects, >> and GC's scavenge algorithm is at best O(n) bound where n is total >> number of objects in object memory, >> and at worst is where n is total memory used by objects. >> No matter how you turn it, i just wanted to indicate that time to run >> GC is in linear dependency from the amount of data. >> >> Yes, we might invest a lot of effort in making GC more clever, more >> complex and more robust.. but no matter what you do, >> you cannot change the above facts. It means, that any improvements >> will be about diminishing returns, but won't change the picture >> radically. >> >> That means that sooner or later you will have to deal with it: a >> problem of splitting data on multiple independent chunks, >> and making your service to run on multiple machines , in order to use >> more CPU power, more memory and be more reliable etc. >> At this point, your main dilemma is to invent a fast and robust >> interfaces to communicate between images or between image(s)/ database >> etc. >> >> We should concentrate on things which dealing with inter-image >> communication and image-database communication, >> because it is the only way to ensure that we will answer upcoming >> future problems. Relying on using a single huge image is way to >> nowhere. >> > > -- > Janko Mivšek > Aida/Web > Smalltalk Web Application Server > http://www.aidaweb.si > -- Best regards, Igor Stasenko.
