Wait, I'm really confused by what you are describing, Jeff. Sorry if these are obvious questions, but can you help me get a better grasp of your use case?
You have a large amount of data, that is generally readable by all users. Users create their own sandbox, from which they can later exclude portions of the global data set. User can share their sandbox with others, so really we are talking about sandbox permissions and not so much user permissions. Sandboxes are created often. Or, at least much more often than the data changes. Are those all accurate statements? If so, can you clarify the following points: Do users typically remove large amounts of data from their sandbox? 1%? 10%? 99%? Assuming data is removed via rules, are the rules applied automatically to new data under ingest? Thanks, Mike On Wed, Mar 19, 2014 at 12:54 PM, Jeff Kunkle <[email protected]> wrote: > Hi John, > > Yes it's accurate that the system controls the label and who is associated > with it; there are no Accumulo-internal user accounts. But I don't think > it's feasible to remove a sandbox label from something that should be > hidden. Such a scenario would imply that all data is "tagged" with the > labels of every sandbox that is allowed to see the data, which would be > most. It would also imply that the creation of a new sandbox would > necessitate changing the visibility of everything in Accumulo to include > the new sandbox label, effectively rewriting the entire database. Sanboxes > are created and deleted all the time in our application, so it doesn't seem > like a feasible solution to me. > > -Jeff > > On Mar 19, 2014, at 12:16 PM, Josh Elser <[email protected]> wrote: > > > It kind of sounds like you could manage this much easier by controlling > the authorizations a user gets (notably the workspace name) and the > grant/revoke above the Accumulo level. > > > > A sandbox has a unique label and the external system controls which > users are granted that label. This way, each sandbox can be modified > individually (using authorizations that contain the data visibility and the > sandbox label) or the original data set could be modified (by omitting a > sandbox label in the authorizations used). > > > > Is that accurate? > > > > On 3/19/14, 12:05 PM, Jeff Kunkle wrote: > >> I attempted to simplify the scenario to facilitate discussion, which on > >> second thought may have been a mistake. Here's the whole scenario: > >> > >> Different users have access to different subsets of the data depending > >> on their authorizations and the visibility of the data. Users "work > >> with" the data in what we call a sandbox. Sanboxes can be shared with > >> other users (this is the group creation I was talking about earlier). > >> Deletes to the data would be "scoped" to the sandbox by changing the > >> visibility to add "& !workspace_name" so that people viewing the > >> workspace wouldn't see the data but everyone else would. > >> > >> On Mar 19, 2014, at 11:48 AM, Sean Busbey <[email protected] > >> <mailto:[email protected]>> wrote: > >> > >>> On Wed, Mar 19, 2014 at 10:43 AM, Jeff Kunkle <[email protected] > >>> <mailto:[email protected]>> wrote: > >>> > >>> New groups are created on the fly by our application when needed. > >>> Under the scenario you describe we'd have to go through all the > >>> data in Accumulo whenever a group is created so that users in the > >>> group can see the existing data. > >>> > >>> > >>> > >>> > >>> Ah! So your use case is that all data defaults to world readable and > >>> then users have the option of opting out of seeing subsets. Right? > >>> > >>> In your scenario user groups also get to opt-out of seeing data on the > >>> fly, yes? Both require rewriting the data. Does the group creation > >>> happen more often? > >> > >
