Is data shared between sandboxes? Could namespaces proxy for sandboxes?
On Wed, Mar 19, 2014 at 1:46 PM, Mike Drob <[email protected]> wrote: > Thanks, that's really helpful. Couple more questions. > > Is a sandbox the same thing as a workspace? Can the terms be used > interchangeably? Just want to make sure I'm not misinterpreting your > answers. > > Is it fair to describe each sandbox as a separate index table for the > global data set? And then when users do deletes, it is only reflected in > the index fields, right? > But you can't just delete values from the index because you need to keep > track of the changes in case the user decides to delete globally (after > appropriate authorization checks, etc...) > > Because the visibility is part of the key, changing it involves re-writing > the data. Which might be just an index record in your case. However, this > is generally an expensive operation. > > I think I need to think on this use case some more, it's definitely > interesting and not something I had considered before. > > > > On Wed, Mar 19, 2014 at 1:24 PM, Jeff Kunkle <[email protected]> wrote: > >> You have a large amount of data, that is generally readable by all users. >> >> Not necessarily. All data has some visibility constraint that a users >> authorization's may or may not satisfy. >> >> Users create their own sandbox, from which they can later exclude >> portions of the global data set. >> >> Yes, users create their own sandboxes which are populated with global >> data. They may decide to delete some of that data and the change needs to >> be scoped to their sandbox until the change is published globally. >> > >> User can share their sandbox with others, so really we are talking about >> sandbox permissions and not so much user permissions. >> >> Yes, users can share their sandbox with others, but a sandbox is just a >> collection of pointers to data. Users sharing a workspace may not >> necessarily see all of the same data depending on their authorizations. >> >> Sandboxes are created often. Or, at least much more often than the data >> changes. >> >> Yes, sandboxes are created often. The data is likely to be ingested more >> frequently than sandboxes will be created. >> >> Do users typically remove large amounts of data from their sandbox? 1%? >> 10%? 99%? >> >> I don't have good numbers to share here. >> >> Assuming data is removed via rules, are the rules applied automatically >> to new data under ingest? >> >> I would say no, although I'm not positive I understand the question. >> Users are not removing data from their sandbox per se, but they may delete >> data that should then be hidden from their workspace. The data is not >> really deleted though and is still visible to other users in other >> sandboxes. Only when the deletion is published does it get deleted for >> everyone. >> >> On Mar 19, 2014, at 1:03 PM, Mike Drob <[email protected]> wrote: >> >> Wait, I'm really confused by what you are describing, Jeff. Sorry if >> these are obvious questions, but can you help me get a better grasp of your >> use case? >> >> You have a large amount of data, that is generally readable by all users. >> Users create their own sandbox, from which they can later exclude >> portions of the global data set. >> User can share their sandbox with others, so really we are talking about >> sandbox permissions and not so much user permissions. >> Sandboxes are created often. Or, at least much more often than the data >> changes. >> >> Are those all accurate statements? If so, can you clarify the following >> points: >> >> Do users typically remove large amounts of data from their sandbox? 1%? >> 10%? 99%? >> Assuming data is removed via rules, are the rules applied automatically >> to new data under ingest? >> >> Thanks, >> Mike >> >> >> On Wed, Mar 19, 2014 at 12:54 PM, Jeff Kunkle <[email protected]> wrote: >> >>> Hi John, >>> >>> Yes it's accurate that the system controls the label and who is >>> associated with it; there are no Accumulo-internal user accounts. But I >>> don't think it's feasible to remove a sandbox label from something that >>> should be hidden. Such a scenario would imply that all data is "tagged" >>> with the labels of every sandbox that is allowed to see the data, which >>> would be most. It would also imply that the creation of a new sandbox would >>> necessitate changing the visibility of everything in Accumulo to include >>> the new sandbox label, effectively rewriting the entire database. Sanboxes >>> are created and deleted all the time in our application, so it doesn't seem >>> like a feasible solution to me. >>> >>> -Jeff >>> >>> On Mar 19, 2014, at 12:16 PM, Josh Elser <[email protected]> wrote: >>> >>> > It kind of sounds like you could manage this much easier by >>> controlling the authorizations a user gets (notably the workspace name) and >>> the grant/revoke above the Accumulo level. >>> > >>> > A sandbox has a unique label and the external system controls which >>> users are granted that label. This way, each sandbox can be modified >>> individually (using authorizations that contain the data visibility and the >>> sandbox label) or the original data set could be modified (by omitting a >>> sandbox label in the authorizations used). >>> > >>> > Is that accurate? >>> > >>> > On 3/19/14, 12:05 PM, Jeff Kunkle wrote: >>> >> I attempted to simplify the scenario to facilitate discussion, which >>> on >>> >> second thought may have been a mistake. Here's the whole scenario: >>> >> >>> >> Different users have access to different subsets of the data depending >>> >> on their authorizations and the visibility of the data. Users "work >>> >> with" the data in what we call a sandbox. Sanboxes can be shared with >>> >> other users (this is the group creation I was talking about earlier). >>> >> Deletes to the data would be "scoped" to the sandbox by changing the >>> >> visibility to add "& !workspace_name" so that people viewing the >>> >> workspace wouldn't see the data but everyone else would. >>> >> >>> >> On Mar 19, 2014, at 11:48 AM, Sean Busbey <[email protected] >>> >> <mailto:[email protected]>> wrote: >>> >> >>> >>> On Wed, Mar 19, 2014 at 10:43 AM, Jeff Kunkle <[email protected] >>> >>> <mailto:[email protected]>> wrote: >>> >>> >>> >>> New groups are created on the fly by our application when needed. >>> >>> Under the scenario you describe we'd have to go through all the >>> >>> data in Accumulo whenever a group is created so that users in the >>> >>> group can see the existing data. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> Ah! So your use case is that all data defaults to world readable and >>> >>> then users have the option of opting out of seeing subsets. Right? >>> >>> >>> >>> In your scenario user groups also get to opt-out of seeing data on >>> the >>> >>> fly, yes? Both require rewriting the data. Does the group creation >>> >>> happen more often? >>> >> >>> >>> >> >> >
