[GitHub] [accumulo] keith-turner commented on issue #1043: Support stable ~del split points and avoid ~del hotspots

2019-08-09 Thread GitBox
keith-turner commented on issue #1043: Support stable ~del split points and 
avoid ~del hotspots 
URL: https://github.com/apache/accumulo/issues/1043#issuecomment-519962464
 
 
   Another reason to resolve in upgrade is that over time handling more legacy 
persisted formats becomes harder and harder to test.  For example assume in the 
future there are 5 different legacy data formats for the metadata table, will 
we ever test different combination of the legacy formats?  No, that testing 
will likely not be done.
   
   Doing a transformation in the upgrade step and testing that transformation 
independently seems more maintainable.  The current code only needs to worry 
about testing the current format.  The testing is much simpler and can better 
cover the data expected.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [accumulo] keith-turner commented on issue #1043: Support stable ~del split points and avoid ~del hotspots

2019-08-09 Thread GitBox
keith-turner commented on issue #1043: Support stable ~del split points and 
avoid ~del hotspots 
URL: https://github.com/apache/accumulo/issues/1043#issuecomment-519959871
 
 
   > what did anyone think of the solution in my previous comment that would 
not require an upgrade?
   
   I think its a clever solution and I think that utilizing the values is a 
good way to go if we do not resolve this in upgrade.  I am leaning towards 
resolving this in upgrade because I think it keeps the existing code simpler, 
but I am not 100% sure this is the best path to take.  
   
   The `makeRelative()` function you asked about earlier has some code in it to 
handle legacy paths that makes it more complicated.  We should improve 
`makeRelative()` to clearly document parts of it that are handling legacy 
paths.  That code is confusing now because of the legacy support.
   
   If think if you pursue not resolving this in upgrade that it would be best 
to clearly document and isolate the code that handles legacy delete candidates.
   
   If you are going to pursue the upgrade path, the following is one possible 
way to do it.
   
* First submit a PR that changes the code to use hashes without any concern 
for upgrade.  So it only works with a newyly initialized system at first.  Just 
add something that throws an exception in the upgrade code.
* Second submit a PR for the upgrade code.
   
   I will try to get #1313 in ASAP so you can build on it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [accumulo] keith-turner commented on issue #1043: Support stable ~del split points and avoid ~del hotspots

2019-08-08 Thread GitBox
keith-turner commented on issue #1043: Support stable ~del split points and 
avoid ~del hotspots 
URL: https://github.com/apache/accumulo/issues/1043#issuecomment-519619973
 
 
   I think it would make sense to fail fast on empty paths throwing and 
exception when trying to set them.  Anything that consumes gc candidate paths 
should also fail on empy paths, but in practice it would not see them.  I have 
found its nice if code that reads persisted data does not assumet that it was 
written in a certain way. 
   
   In #1313 I pushed all reading and writing of GC candidates into Ample and 
moved this out of the GC code.  I think it would make sense if Ample added and 
removed the hashes, but code outside of Ample was never aware of the hashes and 
only dealt with paths.
   
   > My next question - how is code that requires a migration path managed? 
Would it be checked into the master baseline for 2.1?
   
   The metadata should be rewritten in [Upgrader9to10.upgradeMetadata()](
   
https://github.com/apache/accumulo/blob/221aaf8f8805b5abfe07be5244dfbb28f8fc79d0/server/master/src/main/java/org/apache/accumulo/master/upgrade/Upgrader9to10.java#L66)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [accumulo] keith-turner commented on issue #1043: Support stable ~del split points and avoid ~del hotspots

2019-07-29 Thread GitBox
keith-turner commented on issue #1043: Support stable ~del split points and 
avoid ~del hotspots 
URL: https://github.com/apache/accumulo/issues/1043#issuecomment-516022400
 
 
   > The first seems desirable because it does not require supporting two 
formats down the road.
   
   I agree it will make future maintenance of the code easier.
   
   > Can you turn off compaction and balancing (whatever) during a migration 
upgrade?
   
   The master process can control what it does.  Since the master does 
migration, balancing, and loading it could pause those.  Can't remember if it 
does.  However things done by tablet servers it may not be able to control, I 
opened #1300 to futher explore this.
   
   For the `~del` markers there is the possibility that the Accumulo GC will 
temporarily miss a `~del` marker, but this is ok because it will be seen later. 
 Just need to make sure the masters move of the ~del markers is idempotent.
   
   The master process will abort if there are any outstanding FATE transactions 
and an upgrade is needed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services