[GitHub] [accumulo] keith-turner commented on issue #1043: Support stable ~del split points and avoid ~del hotspots
keith-turner commented on issue #1043: Support stable ~del split points and avoid ~del hotspots URL: https://github.com/apache/accumulo/issues/1043#issuecomment-519962464 Another reason to resolve in upgrade is that over time handling more legacy persisted formats becomes harder and harder to test. For example assume in the future there are 5 different legacy data formats for the metadata table, will we ever test different combination of the legacy formats? No, that testing will likely not be done. Doing a transformation in the upgrade step and testing that transformation independently seems more maintainable. The current code only needs to worry about testing the current format. The testing is much simpler and can better cover the data expected. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [accumulo] keith-turner commented on issue #1043: Support stable ~del split points and avoid ~del hotspots
keith-turner commented on issue #1043: Support stable ~del split points and avoid ~del hotspots URL: https://github.com/apache/accumulo/issues/1043#issuecomment-519959871 > what did anyone think of the solution in my previous comment that would not require an upgrade? I think its a clever solution and I think that utilizing the values is a good way to go if we do not resolve this in upgrade. I am leaning towards resolving this in upgrade because I think it keeps the existing code simpler, but I am not 100% sure this is the best path to take. The `makeRelative()` function you asked about earlier has some code in it to handle legacy paths that makes it more complicated. We should improve `makeRelative()` to clearly document parts of it that are handling legacy paths. That code is confusing now because of the legacy support. If think if you pursue not resolving this in upgrade that it would be best to clearly document and isolate the code that handles legacy delete candidates. If you are going to pursue the upgrade path, the following is one possible way to do it. * First submit a PR that changes the code to use hashes without any concern for upgrade. So it only works with a newyly initialized system at first. Just add something that throws an exception in the upgrade code. * Second submit a PR for the upgrade code. I will try to get #1313 in ASAP so you can build on it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [accumulo] keith-turner commented on issue #1043: Support stable ~del split points and avoid ~del hotspots
keith-turner commented on issue #1043: Support stable ~del split points and avoid ~del hotspots URL: https://github.com/apache/accumulo/issues/1043#issuecomment-519619973 I think it would make sense to fail fast on empty paths throwing and exception when trying to set them. Anything that consumes gc candidate paths should also fail on empy paths, but in practice it would not see them. I have found its nice if code that reads persisted data does not assumet that it was written in a certain way. In #1313 I pushed all reading and writing of GC candidates into Ample and moved this out of the GC code. I think it would make sense if Ample added and removed the hashes, but code outside of Ample was never aware of the hashes and only dealt with paths. > My next question - how is code that requires a migration path managed? Would it be checked into the master baseline for 2.1? The metadata should be rewritten in [Upgrader9to10.upgradeMetadata()]( https://github.com/apache/accumulo/blob/221aaf8f8805b5abfe07be5244dfbb28f8fc79d0/server/master/src/main/java/org/apache/accumulo/master/upgrade/Upgrader9to10.java#L66) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [accumulo] keith-turner commented on issue #1043: Support stable ~del split points and avoid ~del hotspots
keith-turner commented on issue #1043: Support stable ~del split points and avoid ~del hotspots URL: https://github.com/apache/accumulo/issues/1043#issuecomment-516022400 > The first seems desirable because it does not require supporting two formats down the road. I agree it will make future maintenance of the code easier. > Can you turn off compaction and balancing (whatever) during a migration upgrade? The master process can control what it does. Since the master does migration, balancing, and loading it could pause those. Can't remember if it does. However things done by tablet servers it may not be able to control, I opened #1300 to futher explore this. For the `~del` markers there is the possibility that the Accumulo GC will temporarily miss a `~del` marker, but this is ok because it will be seen later. Just need to make sure the masters move of the ~del markers is idempotent. The master process will abort if there are any outstanding FATE transactions and an upgrade is needed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services