Re: [Sugar-devel] Datastore redesign
On Mon, Jul 6, 2009 at 16:33, Eben Eliasone...@laptop.org wrote: On Mon, Jul 6, 2009 at 10:02 AM, Sascha Silbesascha-ml-ui-sugar-de...@silbe.org wrote: On Mon, Jul 06, 2009 at 12:29:53PM +0200, Tomeu Vizoso wrote: Agreed. I have to say that your proposal is excellent, congratulations! Thanks, I'm flattered. :) Is the asynchronous API design useful enough to warrant more complex implementation? I'm not sure, but I think that whatever decision we take should be made based on actual usage of the DS. What about proposing an example of how an existing activity would be modified to use the new API? OK, will work on one. - For save() calls activity needs to wait for result (containing new version_id) before it can invoke save() again for the same object which can take quite some time if save() is sync - especially if other activities are saving at the same time. What about having a separate call that returns synchronously a new tree_id and/or version_id? Interesting idea, need to think about it. As we're going to use UUIDs not using requested versions shouldn't be an issue (for other version number schemes like the one you propose below holes in the numbering could be troublesome). I think holes can be expected, so we should definitely be prepared for them. Making the API fully asynchronous is the cause for much of the complexity of my proposal, but if we eliminate the queueing the response times for write accesses and checkout() can be very long even for unrelated operations. Why for unrelated operations? Because we're serializing VCS operations. They are IO bound (more specifically: disk bound) and parallelisation would only lead to IO starvation, especially for HDDs. # do we want an optimized way to determine (only) the branch HEADs of a given tree_id? This depends on the intended UI. My opinion is that if we branch at every interesting modification (triggered by the activity detecting an interesting change or by the user clicking on the Keep button), we I don't think we need to branch in these instances. These events should create new versions, but not necessarily new branches. In my proposal, branch is not a concept directly exposed to the user. Is just an artefact that allows the journal to display the relevant info to an user. If we branch at the following events, displaying only HEADs of branches in the journal list view makes sense for the user: - new tree_id, - resume entry, - keep button, - after a big change in the activity model (user deletes the whole drawing, etc). There are other ways to manage what we want, but this approach made it very easy to implement. Just to make sure I'm understood, I see why using branches this way may seem conceptually wrong, it's not how we would work in a VCS or CMS. But by creating new branches at those points and displaying only HEADs of branches in the list view is the simplest way I found of displaying the relevant entries in a robust way (resisting activity crashes). would like to display in the object list all the HEADs of each branch in each tree_id. In that case yes, we need a way to retrieve that list that is fast on both the client and the server side. My imagined usage of branches was to create them automatically upon altering a non-HEAD version. This makes sense to me, personally. A user basing off an old version could mean the newer version is broken (in that case promoting the new version to the HEAD of the current branch makes more sense) or that (s)he uses the older version as a kind of template to create derivates (so creating a branch would make most sense). But I'm open to alternative suggestions. We'd most likely need a way to explicitly create branches then. # using symlink instead of hardlink for incoming queue since we want to support directory trees, not just files What justifies this new requirement? That it's a) of use to activities (IIRC some of them use ZIP files right instead now), This has its own merits, though. Encapsulating the related files as a single archive makes transporting the file around, or sending it to a friend, trivial. It's good for the same reason that activity bundles are good. But this may be considered an internal implementation detail of the DS, unless we want to support users directly browsing the DS backend. b) easy enough to achieve with the new design and c) leads to better delta compression and thus disk space effiency. # since an index rebuild can take a lot of time we need to provide UI feedback while doing that Any I/O operation can potentially take a lot of time, but with the current version of the DS rebuilding an index with a few thousands of entries is not so slow on the XO. We should never need to rebuild the index, so this new requirement might not be justified (given the current resources, all the other work we need to do, etc). OK, good to know index rebuilding is fast. So the simple,
Re: [Sugar-devel] Datastore redesign
On Thu, Jul 9, 2009 at 8:00 AM, Tomeu Vizosoto...@sugarlabs.org wrote: On Mon, Jul 6, 2009 at 16:33, Eben Eliasone...@laptop.org wrote: On Mon, Jul 6, 2009 at 10:02 AM, Sascha Silbesascha-ml-ui-sugar-de...@silbe.org wrote: On Mon, Jul 06, 2009 at 12:29:53PM +0200, Tomeu Vizoso wrote: Agreed. I have to say that your proposal is excellent, congratulations! Thanks, I'm flattered. :) Is the asynchronous API design useful enough to warrant more complex implementation? I'm not sure, but I think that whatever decision we take should be made based on actual usage of the DS. What about proposing an example of how an existing activity would be modified to use the new API? OK, will work on one. - For save() calls activity needs to wait for result (containing new version_id) before it can invoke save() again for the same object which can take quite some time if save() is sync - especially if other activities are saving at the same time. What about having a separate call that returns synchronously a new tree_id and/or version_id? Interesting idea, need to think about it. As we're going to use UUIDs not using requested versions shouldn't be an issue (for other version number schemes like the one you propose below holes in the numbering could be troublesome). I think holes can be expected, so we should definitely be prepared for them. Making the API fully asynchronous is the cause for much of the complexity of my proposal, but if we eliminate the queueing the response times for write accesses and checkout() can be very long even for unrelated operations. Why for unrelated operations? Because we're serializing VCS operations. They are IO bound (more specifically: disk bound) and parallelisation would only lead to IO starvation, especially for HDDs. # do we want an optimized way to determine (only) the branch HEADs of a given tree_id? This depends on the intended UI. My opinion is that if we branch at every interesting modification (triggered by the activity detecting an interesting change or by the user clicking on the Keep button), we I don't think we need to branch in these instances. These events should create new versions, but not necessarily new branches. In my proposal, branch is not a concept directly exposed to the user. Is just an artefact that allows the journal to display the relevant info to an user. If we branch at the following events, displaying only HEADs of branches in the journal list view makes sense for the user: - new tree_id, - resume entry, - keep button, - after a big change in the activity model (user deletes the whole drawing, etc). There are other ways to manage what we want, but this approach made it very easy to implement. Just to make sure I'm understood, I see why using branches this way may seem conceptually wrong, it's not how we would work in a VCS or CMS. But by creating new branches at those points and displaying only HEADs of branches in the list view is the simplest way I found of displaying the relevant entries in a robust way (resisting activity crashes). That's fine, assuming these are actually the entries we want to show, but I'm not sure that's always the case. For instance, we might show only the most recent version in the object view, while showing each version within the action view. If we store action-objects as Ben proposed, we may have an entirely different way of querying what should be shown in the actions view anyway. We've also had some ideas on how to expose the branching structure within the version popup, in which case branching as one would in a VCS would make more sense. Eben would like to display in the object list all the HEADs of each branch in each tree_id. In that case yes, we need a way to retrieve that list that is fast on both the client and the server side. My imagined usage of branches was to create them automatically upon altering a non-HEAD version. This makes sense to me, personally. A user basing off an old version could mean the newer version is broken (in that case promoting the new version to the HEAD of the current branch makes more sense) or that (s)he uses the older version as a kind of template to create derivates (so creating a branch would make most sense). But I'm open to alternative suggestions. We'd most likely need a way to explicitly create branches then. # using symlink instead of hardlink for incoming queue since we want to support directory trees, not just files What justifies this new requirement? That it's a) of use to activities (IIRC some of them use ZIP files right instead now), This has its own merits, though. Encapsulating the related files as a single archive makes transporting the file around, or sending it to a friend, trivial. It's good for the same reason that activity bundles are good. But this may be considered an internal implementation detail of the DS, unless we want to support users directly
Re: [Sugar-devel] Datastore redesign
On Mon, Jul 6, 2009 at 10:02 AM, Sascha Silbesascha-ml-ui-sugar-de...@silbe.org wrote: On Mon, Jul 06, 2009 at 12:29:53PM +0200, Tomeu Vizoso wrote: Agreed. I have to say that your proposal is excellent, congratulations! Thanks, I'm flattered. :) Is the asynchronous API design useful enough to warrant more complex implementation? I'm not sure, but I think that whatever decision we take should be made based on actual usage of the DS. What about proposing an example of how an existing activity would be modified to use the new API? OK, will work on one. - For save() calls activity needs to wait for result (containing new version_id) before it can invoke save() again for the same object which can take quite some time if save() is sync - especially if other activities are saving at the same time. What about having a separate call that returns synchronously a new tree_id and/or version_id? Interesting idea, need to think about it. As we're going to use UUIDs not using requested versions shouldn't be an issue (for other version number schemes like the one you propose below holes in the numbering could be troublesome). I think holes can be expected, so we should definitely be prepared for them. Making the API fully asynchronous is the cause for much of the complexity of my proposal, but if we eliminate the queueing the response times for write accesses and checkout() can be very long even for unrelated operations. Why for unrelated operations? Because we're serializing VCS operations. They are IO bound (more specifically: disk bound) and parallelisation would only lead to IO starvation, especially for HDDs. # do we want an optimized way to determine (only) the branch HEADs of a given tree_id? This depends on the intended UI. My opinion is that if we branch at every interesting modification (triggered by the activity detecting an interesting change or by the user clicking on the Keep button), we I don't think we need to branch in these instances. These events should create new versions, but not necessarily new branches. would like to display in the object list all the HEADs of each branch in each tree_id. In that case yes, we need a way to retrieve that list that is fast on both the client and the server side. My imagined usage of branches was to create them automatically upon altering a non-HEAD version. This makes sense to me, personally. A user basing off an old version could mean the newer version is broken (in that case promoting the new version to the HEAD of the current branch makes more sense) or that (s)he uses the older version as a kind of template to create derivates (so creating a branch would make most sense). But I'm open to alternative suggestions. We'd most likely need a way to explicitly create branches then. # using symlink instead of hardlink for incoming queue since we want to support directory trees, not just files What justifies this new requirement? That it's a) of use to activities (IIRC some of them use ZIP files right instead now), This has its own merits, though. Encapsulating the related files as a single archive makes transporting the file around, or sending it to a friend, trivial. It's good for the same reason that activity bundles are good. b) easy enough to achieve with the new design and c) leads to better delta compression and thus disk space effiency. # since an index rebuild can take a lot of time we need to provide UI feedback while doing that Any I/O operation can potentially take a lot of time, but with the current version of the DS rebuilding an index with a few thousands of entries is not so slow on the XO. We should never need to rebuild the index, so this new requirement might not be justified (given the current resources, all the other work we need to do, etc). OK, good to know index rebuilding is fast. So the simple, boolean API I proposed (check_ready() / Ready()) suffices. # detecting identical files across objects isn't as important since duplicates are mostly expected to occur as versions of the same object Based on how current activities are using the DS, this isn't like that. The most common case I have heard from the field are children downloading a PDF for reading several times. Oh, didn't know that, so it's a new requirement. An alternative to the current method for detecting duplicates is moving this task to activities, is that what you suggest? I'm ambivalent about it. On one hand it's not so easy to achieve in datastore (for various backends) and more indicative of UI deficiencies (why This might be the case, indeed. On other operating systems/browsers, this (downloading multiple copies if the link is clicked multiple times) is expected behavior. Perhaps we can work out some ways to make the UI clearer. did the children download the file several times in the first place; it's bandwidth wastage as well), on the other hand it might not