Re: [Sugar-devel] Datastore redesign

2009-07-09 Thread Tomeu Vizoso
On Mon, Jul 6, 2009 at 16:33, Eben Eliasone...@laptop.org wrote:
 On Mon, Jul 6, 2009 at 10:02 AM, Sascha
 Silbesascha-ml-ui-sugar-de...@silbe.org wrote:
 On Mon, Jul 06, 2009 at 12:29:53PM +0200, Tomeu Vizoso wrote:

 Agreed. I have to say that your proposal is excellent, congratulations!

 Thanks, I'm flattered. :)


 Is the asynchronous API design useful enough to warrant more complex
 implementation?

 I'm not sure, but I think that whatever decision we take should be
 made based on actual usage of the DS. What about proposing an example
 of how an existing activity would be modified to use the new API?

 OK, will work on one.


  - For save() calls activity needs to wait for result (containing new
    version_id) before it can invoke save() again for the same object
    which can take quite some time if save() is sync - especially if other
    activities are saving at the same time.

 What about having a separate call that returns synchronously a new
 tree_id and/or version_id?

 Interesting idea, need to think about it. As we're going to use UUIDs not
 using requested versions shouldn't be an issue (for other version number
 schemes like the one you propose below holes in the numbering could be
 troublesome).

 I think holes can be expected, so we should definitely be prepared for them.


 Making the API fully asynchronous is the cause for much of the complexity
 of
 my proposal, but if we eliminate the queueing the response times for
 write
 accesses and checkout() can be very long even for unrelated operations.

 Why for unrelated operations?

 Because we're serializing VCS operations. They are IO bound (more
 specifically: disk bound) and parallelisation would only lead to IO
 starvation, especially for HDDs.


 # do we want an optimized way to determine (only) the branch HEADs of
 a given tree_id?

 This depends on the intended UI. My opinion is that if we branch at
 every interesting modification (triggered by the activity detecting an
 interesting change or by the user clicking on the Keep button), we

 I don't think we need to branch in these instances. These events
 should create new versions, but not necessarily new branches.

In my proposal, branch is not a concept directly exposed to the user.
Is just an artefact that allows the journal to display the relevant
info to an user. If we branch at the following events, displaying only
HEADs of branches in the journal list view makes sense for the user:

- new tree_id,
- resume entry,
- keep button,
- after a big change in the activity model (user deletes the whole
drawing, etc).

There are other ways to manage what we want, but this approach made it
very easy to implement.

Just to make sure I'm understood, I see why using branches this way
may seem conceptually wrong, it's not how we would work in a VCS or
CMS. But by creating new branches at those points and displaying only
HEADs of branches in the list view is the simplest way I found of
displaying the relevant entries in a robust way (resisting activity
crashes).

 would like to display in the object list all the HEADs of each branch
 in each tree_id. In that case yes, we need a way to retrieve that list
 that is fast on both the client and the server side.

 My imagined usage of branches was to create them automatically upon altering
 a non-HEAD version.

 This makes sense to me, personally.

 A user basing off an old version could mean the newer version is broken
 (in that case promoting the new version to the HEAD of the current branch
 makes more sense) or that (s)he uses the older version as a kind of template
 to create derivates (so creating a branch would make most sense).
 But I'm open to alternative suggestions. We'd most likely need a way to
 explicitly create branches then.


 # using symlink instead of hardlink for incoming queue since we want
 to support directory trees, not just files

 What justifies this new requirement?

 That it's
 a) of use to activities (IIRC some of them use ZIP files right instead now),

 This has its own merits, though. Encapsulating the related files as a
 single archive makes transporting the file around, or sending it to a
 friend, trivial. It's good for the same reason that activity bundles
 are good.

But this may be considered an internal implementation detail of the
DS, unless we want to support users directly browsing the DS backend.

 b) easy enough to achieve with the new design and
 c) leads to better delta compression and thus disk space effiency.


 # since an index rebuild can take a lot of time we need to provide UI
 feedback while doing that

 Any I/O operation can potentially take a lot of time, but with the
 current version of the DS rebuilding an index with a few thousands of
 entries is not so slow on the XO. We should never need to rebuild the
 index, so this new requirement might not be justified (given the
 current resources, all the other work we need to do, etc).

 OK, good to know index rebuilding is fast. So the simple, 

Re: [Sugar-devel] Datastore redesign

2009-07-09 Thread Eben Eliason
On Thu, Jul 9, 2009 at 8:00 AM, Tomeu Vizosoto...@sugarlabs.org wrote:
 On Mon, Jul 6, 2009 at 16:33, Eben Eliasone...@laptop.org wrote:
 On Mon, Jul 6, 2009 at 10:02 AM, Sascha
 Silbesascha-ml-ui-sugar-de...@silbe.org wrote:
 On Mon, Jul 06, 2009 at 12:29:53PM +0200, Tomeu Vizoso wrote:

 Agreed. I have to say that your proposal is excellent, congratulations!

 Thanks, I'm flattered. :)


 Is the asynchronous API design useful enough to warrant more complex
 implementation?

 I'm not sure, but I think that whatever decision we take should be
 made based on actual usage of the DS. What about proposing an example
 of how an existing activity would be modified to use the new API?

 OK, will work on one.


  - For save() calls activity needs to wait for result (containing new
    version_id) before it can invoke save() again for the same object
    which can take quite some time if save() is sync - especially if other
    activities are saving at the same time.

 What about having a separate call that returns synchronously a new
 tree_id and/or version_id?

 Interesting idea, need to think about it. As we're going to use UUIDs not
 using requested versions shouldn't be an issue (for other version number
 schemes like the one you propose below holes in the numbering could be
 troublesome).

 I think holes can be expected, so we should definitely be prepared for 
 them.


 Making the API fully asynchronous is the cause for much of the complexity
 of
 my proposal, but if we eliminate the queueing the response times for
 write
 accesses and checkout() can be very long even for unrelated operations.

 Why for unrelated operations?

 Because we're serializing VCS operations. They are IO bound (more
 specifically: disk bound) and parallelisation would only lead to IO
 starvation, especially for HDDs.


 # do we want an optimized way to determine (only) the branch HEADs of
 a given tree_id?

 This depends on the intended UI. My opinion is that if we branch at
 every interesting modification (triggered by the activity detecting an
 interesting change or by the user clicking on the Keep button), we

 I don't think we need to branch in these instances. These events
 should create new versions, but not necessarily new branches.

 In my proposal, branch is not a concept directly exposed to the user.
 Is just an artefact that allows the journal to display the relevant
 info to an user. If we branch at the following events, displaying only
 HEADs of branches in the journal list view makes sense for the user:

 - new tree_id,
 - resume entry,
 - keep button,
 - after a big change in the activity model (user deletes the whole
 drawing, etc).

 There are other ways to manage what we want, but this approach made it
 very easy to implement.

 Just to make sure I'm understood, I see why using branches this way
 may seem conceptually wrong, it's not how we would work in a VCS or
 CMS. But by creating new branches at those points and displaying only
 HEADs of branches in the list view is the simplest way I found of
 displaying the relevant entries in a robust way (resisting activity
 crashes).

That's fine, assuming these are actually the entries we want to show,
but I'm not sure that's always the case. For instance, we might show
only the most recent version in the object view, while showing each
version within the action view. If we store action-objects as Ben
proposed, we may have an entirely different way of querying what
should be shown in the actions view anyway.

We've also had some ideas on how to expose the branching structure
within the version popup, in which case branching as one would in a
VCS would make more sense.

Eben

 would like to display in the object list all the HEADs of each branch
 in each tree_id. In that case yes, we need a way to retrieve that list
 that is fast on both the client and the server side.

 My imagined usage of branches was to create them automatically upon altering
 a non-HEAD version.

 This makes sense to me, personally.

 A user basing off an old version could mean the newer version is broken
 (in that case promoting the new version to the HEAD of the current branch
 makes more sense) or that (s)he uses the older version as a kind of template
 to create derivates (so creating a branch would make most sense).
 But I'm open to alternative suggestions. We'd most likely need a way to
 explicitly create branches then.


 # using symlink instead of hardlink for incoming queue since we want
 to support directory trees, not just files

 What justifies this new requirement?

 That it's
 a) of use to activities (IIRC some of them use ZIP files right instead now),

 This has its own merits, though. Encapsulating the related files as a
 single archive makes transporting the file around, or sending it to a
 friend, trivial. It's good for the same reason that activity bundles
 are good.

 But this may be considered an internal implementation detail of the
 DS, unless we want to support users directly 

Re: [Sugar-devel] Datastore redesign

2009-07-06 Thread Eben Eliason
On Mon, Jul 6, 2009 at 10:02 AM, Sascha
Silbesascha-ml-ui-sugar-de...@silbe.org wrote:
 On Mon, Jul 06, 2009 at 12:29:53PM +0200, Tomeu Vizoso wrote:

 Agreed. I have to say that your proposal is excellent, congratulations!

 Thanks, I'm flattered. :)


 Is the asynchronous API design useful enough to warrant more complex
 implementation?

 I'm not sure, but I think that whatever decision we take should be
 made based on actual usage of the DS. What about proposing an example
 of how an existing activity would be modified to use the new API?

 OK, will work on one.


  - For save() calls activity needs to wait for result (containing new
    version_id) before it can invoke save() again for the same object
    which can take quite some time if save() is sync - especially if other
    activities are saving at the same time.

 What about having a separate call that returns synchronously a new
 tree_id and/or version_id?

 Interesting idea, need to think about it. As we're going to use UUIDs not
 using requested versions shouldn't be an issue (for other version number
 schemes like the one you propose below holes in the numbering could be
 troublesome).

I think holes can be expected, so we should definitely be prepared for them.


 Making the API fully asynchronous is the cause for much of the complexity
 of
 my proposal, but if we eliminate the queueing the response times for
 write
 accesses and checkout() can be very long even for unrelated operations.

 Why for unrelated operations?

 Because we're serializing VCS operations. They are IO bound (more
 specifically: disk bound) and parallelisation would only lead to IO
 starvation, especially for HDDs.


 # do we want an optimized way to determine (only) the branch HEADs of
 a given tree_id?

 This depends on the intended UI. My opinion is that if we branch at
 every interesting modification (triggered by the activity detecting an
 interesting change or by the user clicking on the Keep button), we

I don't think we need to branch in these instances. These events
should create new versions, but not necessarily new branches.

 would like to display in the object list all the HEADs of each branch
 in each tree_id. In that case yes, we need a way to retrieve that list
 that is fast on both the client and the server side.

 My imagined usage of branches was to create them automatically upon altering
 a non-HEAD version.

This makes sense to me, personally.

 A user basing off an old version could mean the newer version is broken
 (in that case promoting the new version to the HEAD of the current branch
 makes more sense) or that (s)he uses the older version as a kind of template
 to create derivates (so creating a branch would make most sense).
 But I'm open to alternative suggestions. We'd most likely need a way to
 explicitly create branches then.


 # using symlink instead of hardlink for incoming queue since we want
 to support directory trees, not just files

 What justifies this new requirement?

 That it's
 a) of use to activities (IIRC some of them use ZIP files right instead now),

This has its own merits, though. Encapsulating the related files as a
single archive makes transporting the file around, or sending it to a
friend, trivial. It's good for the same reason that activity bundles
are good.

 b) easy enough to achieve with the new design and
 c) leads to better delta compression and thus disk space effiency.


 # since an index rebuild can take a lot of time we need to provide UI
 feedback while doing that

 Any I/O operation can potentially take a lot of time, but with the
 current version of the DS rebuilding an index with a few thousands of
 entries is not so slow on the XO. We should never need to rebuild the
 index, so this new requirement might not be justified (given the
 current resources, all the other work we need to do, etc).

 OK, good to know index rebuilding is fast. So the simple, boolean API I
 proposed (check_ready() / Ready()) suffices.


 # detecting identical files across objects isn't as important since
 duplicates are mostly expected to occur as versions of the same object

 Based on how current activities are using the DS, this isn't like
 that.
 The most common case I have heard from the field are children
 downloading a PDF for reading several times.

 Oh, didn't know that, so it's a new requirement.

 An alternative to the current method for detecting duplicates is moving
 this task to
 activities, is that what you suggest?

 I'm ambivalent about it. On one hand it's not so easy to achieve in
 datastore (for various backends) and more indicative of UI deficiencies (why

This might be the case, indeed. On other operating systems/browsers,
this (downloading multiple copies if the link is clicked multiple
times) is expected behavior. Perhaps we can work out some ways to make
the UI clearer.

 did the children download the file several times in the first place; it's
 bandwidth wastage as well), on the other hand it might not