The 1.5 solution looks nice. Aware of the potential data loss angle and the sort ordering is also an interesting angle, thank you.
In my particular case where I may not necessarily be aware of all permutations of column visibility of a given key but want to replace them all with a particular new visibility with the same data, how would I go about that? Is there a way to use a batchscanner (step 1 of the batchdeleter approach) to pull down all the permutations, then putdeletes for them and put what I want? In my case I'm pulling one copy of the data down first to verify I have it at the user's current scan auth, then using the #1 approach to clear it out and then put it in again as the vis I need. On Mon, May 13, 2013 at 10:05 AM, Keith Turner <[email protected]> wrote: > > > > On Fri, May 10, 2013 at 12:39 PM, Marc Reichman < > [email protected]> wrote: > >> I have a table with rows which have 3 column values in one column family, >> and a column visibility. >> >> There are situations where I will want to replace the row content with a >> new column visibility; I understand that the visibility attributes are >> immutable, so I will have to delete and re-put. >> >> Am I better off doing: >> 1. BatchDeleter with authorizations to allow access, set range to the key >> in question, call delete, and then put in mutations with the new visibility >> 2. Create mutations with a putDelete followed by a put with the new >> visibility for each value >> 3. Something else entirely? >> > > In 1.5, you can use ACCUMULO-956 > > >> >> For option #2, can I simply do a putDelete on the column >> family/qualifier? Or do I need to "know" the old authorizations to put in a >> visibility expression with the putDelete? >> >> For all of these, can a client get up-to-the-minute results immediately >> after? Or does some kind of compaction need to occur first? >> > > If you send a mutation with a delete and put, the client will be able to > see it after the batchwriter flushes or closes. No compaction needed. > > I am little fuzzy on #1. Will you delete everything in one pass (using > batchdeleter), and then do another pass writing data w/ updated colvis? If > so this would seems to imply that you are pulling the data from another > source (other than the table stuff was deleted from)? > > Make sure the method you chose is not susceptible to data loss in the > event that the client dies. For example if a client was, reading a table > and then writing a delete and updates mutation for each key/val read. If > the client died and some deletes were written, but not the corresponding > updates, then that data would not be seen to be transformed on the second > run. > > When you change the colvis, you change the sort order. If you read a key > and K and change it to K', where K' sorts after K. If you insert K', its > possible that you may read it. Its being inserted in front of the scanners > pointer. Because of buffering in the batch writer and scanner, this would > not occur always, but it would occur occasionally. Something to be aware > of. > > > >
