Re: [Pharo-users] Nearly limitless Image: revisited.

Janko Mivšek Wed, 29 Feb 2012 09:55:58 -0800

Good compromise is a step by step approach, something like:

  1.step: image based persistence up to 1GB with hourly snapshot
  2.step: parts migrated to Fuel and file based persistence
  3.step: Gemstone, with "images" running in parallel (well, any DB
          with images in parallel, Gemstone is certainly the easiest to
          scale from image based persistence)


1GB limit is here just for simplicity, you can probably go further with
64bit images.

Advantages:

  - very very simple start
  - freedom of pure OO modeling,
  - good enough for probably 90% of all projects
  - fastest way from your dreams to reality
  - speed of development - no impedance mismatch, no ORM nightmare
  - you won't believe how much data you can put in 1GB image
  - speed because of always in-memory data processing
  - you can always scale further if you make your design from the start
    with above steps in mind
  - reliability good enough, on reliable hardware probably even better
    than more complex solutions. Main reason: simplicity.

Disadvantages:

  - you easy forgot to include later scalability requirements in
    upfront design
  - such scaling is easy only to OO database while migrating to
    NoSQL (not to mention SQL) database later is very hard if not
    possible
  - up to about 1GB only, because of GC problems as Igor described
  - active users limit (number of requests/s)
  - single point of failure
  - corrupted image will loose all data (but good backup approach helps)
  - undetected image corruption fear (after many otherwise successful
    snapshots, causing non-startable image)
  - lengthy snapshots of bigger images (can be improved with two step
    snapshots, first in memory, then on disk)
  - loss of data between snapshots in case of power or machine failure
    (but this is very rare those days)

Bet regards
Janko

S, Igor Stasenko piše:
> Incidentally, we had a little chat with Marcus yesterday about that.
> 
> No, i don't think it is feasible to use single image to store everything.
> It is convenient, cheap and of course it is way better than dealing
> with communicating with external DB/servers whatever.
> 
> But there's one thing you should know already: the days of vertical
> growth is over.
> 
> Running a service (under VM or not) on a single machine is asking for 
> troubles:
>  - limits on load
>  - susceptible to power outage and other reliability problems
>  etc
> 
> Also, think that the amount of data you need to process correlates
> with CPU horsepower available.
> Which means that yes, you can run a huge image with 64Gb data in it..
> but that means that responsiveness
> of your service will quite often fall beyond any usability limits.
> 
> If we look in terms of VM and pick only one thing - garbage collection,
> you will see that there is certain limits beyond which a performance
> will drop too much, so you naturally
> will start thinking about ways to split data to separate chunks and
> run them on different machines/VMs.
> 
> It is because GC's mark algorithm is O(n) bound, when n is total
> number of references between objects,
> and GC's scavenge algorithm is at best O(n) bound where n is total
> number of objects in object memory,
> and at worst is where n is total memory used by objects.
> No matter how you turn it, i just wanted to indicate that time to run
> GC is in linear dependency from the amount of data.
> 
> Yes, we might invest a lot of effort in making GC more clever, more
> complex and more robust.. but no matter what you do,
> you cannot change the above facts. It means, that any improvements
> will be about diminishing returns, but won't change the picture
> radically.
> 
> That means that sooner or later you will have to deal with it: a
> problem of splitting data on multiple independent chunks,
> and making your service to run on multiple machines , in order to use
> more CPU power, more memory and be more reliable etc.
> At this point, your main dilemma is to invent a fast and robust
> interfaces to communicate between images or between image(s)/ database
> etc.
> 
> We should concentrate on things which dealing with inter-image
> communication and image-database communication,
> because it is the only way to ensure that we will answer upcoming
> future problems. Relying on using a single huge image is way to
> nowhere.
> 

-- 
Janko Mivšek
Aida/Web
Smalltalk Web Application Server
http://www.aidaweb.si

Re: [Pharo-users] Nearly limitless Image: revisited.

Reply via email to