Re: [Pharo-project] Nearly limitless Image: revisited.

Igor Stasenko Wed, 29 Feb 2012 11:27:55 -0800

2012/2/29 Janko Mivšek <[email protected]>:
> Good compromise is a step by step approach, something like:
>
>  1.step: image based persistence up to 1GB with hourly snapshot
>  2.step: parts migrated to Fuel and file based persistence
>  3.step: Gemstone, with "images" running in parallel (well, any DB
>          with images in parallel, Gemstone is certainly the easiest to
>          scale from image based persistence)
>
> 1GB limit is here just for simplicity, you can probably go further with
> 64bit images.
>
yes you can. but then you will definitely need different GC strategy
or change GC..
and this immediately turns advantages, like
  - very very simple
into
  - very very hard
because hacking GC is not that simple :)


> Advantages:
>
>  - very very simple start
>  - freedom of pure OO modeling,
>  - good enough for probably 90% of all projects
>  - fastest way from your dreams to reality
>  - speed of development - no impedance mismatch, no ORM nightmare
>  - you won't believe how much data you can put in 1GB image
>  - speed because of always in-memory data processing
>  - you can always scale further if you make your design from the start
>    with above steps in mind
>  - reliability good enough, on reliable hardware probably even better
>    than more complex solutions. Main reason: simplicity.
>
> Disadvantages:
>
>  - you easy forgot to include later scalability requirements in
>    upfront design
>  - such scaling is easy only to OO database while migrating to
>    NoSQL (not to mention SQL) database later is very hard if not
>    possible
>  - up to about 1GB only, because of GC problems as Igor described
>  - active users limit (number of requests/s)
>  - single point of failure
>  - corrupted image will loose all data (but good backup approach helps)
>  - undetected image corruption fear (after many otherwise successful
>    snapshots, causing non-startable image)
>  - lengthy snapshots of bigger images (can be improved with two step
>    snapshots, first in memory, then on disk)
>  - loss of data between snapshots in case of power or machine failure
>    (but this is very rare those days)
>
> Bet regards
> Janko
>
> S, Igor Stasenko piše:
>> Incidentally, we had a little chat with Marcus yesterday about that.
>>
>> No, i don't think it is feasible to use single image to store everything.
>> It is convenient, cheap and of course it is way better than dealing
>> with communicating with external DB/servers whatever.
>>
>> But there's one thing you should know already: the days of vertical
>> growth is over.
>>
>> Running a service (under VM or not) on a single machine is asking for 
>> troubles:
>>  - limits on load
>>  - susceptible to power outage and other reliability problems
>>  etc
>>
>> Also, think that the amount of data you need to process correlates
>> with CPU horsepower available.
>> Which means that yes, you can run a huge image with 64Gb data in it..
>> but that means that responsiveness
>> of your service will quite often fall beyond any usability limits.
>>
>> If we look in terms of VM and pick only one thing - garbage collection,
>> you will see that there is certain limits beyond which a performance
>> will drop too much, so you naturally
>> will start thinking about ways to split data to separate chunks and
>> run them on different machines/VMs.
>>
>> It is because GC's mark algorithm is O(n) bound, when n is total
>> number of references between objects,
>> and GC's scavenge algorithm is at best O(n) bound where n is total
>> number of objects in object memory,
>> and at worst is where n is total memory used by objects.
>> No matter how you turn it, i just wanted to indicate that time to run
>> GC is in linear dependency from the amount of data.
>>
>> Yes, we might invest a lot of effort in making GC more clever, more
>> complex and more robust.. but no matter what you do,
>> you cannot change the above facts. It means, that any improvements
>> will be about diminishing returns, but won't change the picture
>> radically.
>>
>> That means that sooner or later you will have to deal with it: a
>> problem of splitting data on multiple independent chunks,
>> and making your service to run on multiple machines , in order to use
>> more CPU power, more memory and be more reliable etc.
>> At this point, your main dilemma is to invent a fast and robust
>> interfaces to communicate between images or between image(s)/ database
>> etc.
>>
>> We should concentrate on things which dealing with inter-image
>> communication and image-database communication,
>> because it is the only way to ensure that we will answer upcoming
>> future problems. Relying on using a single huge image is way to
>> nowhere.
>>
>
> --
> Janko Mivšek
> Aida/Web
> Smalltalk Web Application Server
> http://www.aidaweb.si
>



-- 
Best regards,
Igor Stasenko.

Re: [Pharo-project] Nearly limitless Image: revisited.

Reply via email to