[GitHub] [accumulo] keith-turner commented on issue #1647: Compaction tests failing in TableOperationsIT
keith-turner commented on issue #1647: URL: https://github.com/apache/accumulo/issues/1647#issuecomment-653278435 I was thought of a fairly simple way to implement empty compactions w/o riddling the code w/ special cases for the empty set. Using a set with a special URI for this case instead of implementing special handling for the empty set made the changes quite small. I made the changes in the following branch and the test pass now. https://github.com/keith-turner/accumulo/tree/accumulo-1647 These changes build on the changes in #1649. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [accumulo] keith-turner edited a comment on issue #1647: Compaction tests failing in TableOperationsIT
keith-turner edited a comment on issue #1647: URL: https://github.com/apache/accumulo/issues/1647#issuecomment-653234692 > My main concern is inconsistent behavior. Iterators that generate data will work when there is existing files... but not work when there isn't. That is a good point. When I was thinking about this, I thought the generator iterators would only generate data when the source iterator was empty, but I suppose that does not have to be the case. If a tablet has data, then a generator iterator would have to merge the generated data with existing data while respecting sorting. Also the generator iterator would need to only generate data within the bounds of the tablet. Actually writing one of these that is correct seems non-trivial. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [accumulo] keith-turner commented on issue #1647: Compaction tests failing in TableOperationsIT
keith-turner commented on issue #1647: URL: https://github.com/apache/accumulo/issues/1647#issuecomment-653234692 > My main concern is inconsistent behavior. Iterators that generate data will work when there is existing files... but not work when there isn't. That is a good point. When I was thinking about this, I thought the generator iterators would only generate data when the source iterator was empty, but I suppose that does not have to be the case. If a tablet has data, then a generator iterator would have to merge the generated data with existing data while respecting sorting. Also the generator iterator would need to only generate data within the bounds of the table. Actually writing one of these that is correct seems non-trivial. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Accumulo-Master - Build # 3115 - Fixed
The Apache Jenkins build system has built Accumulo-Master (build #3115) Status: Fixed Check console output at https://builds.apache.org/job/Accumulo-Master/3115/ to view the results.
[GitHub] [accumulo] ctubbsii commented on issue #1647: Compaction tests failing in TableOperationsIT
ctubbsii commented on issue #1647: URL: https://github.com/apache/accumulo/issues/1647#issuecomment-653226825 My main concern is inconsistent behavior. Iterators that generate data will work when there is existing files... but not work when there isn't. This is very unintuitive, and the actual state that the behavior turns on is invisible to the end user who executes a compaction. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [accumulo] keith-turner opened a new pull request #1649: Addresses three compaction service follow on issues.
keith-turner opened a new pull request #1649: URL: https://github.com/apache/accumulo/pull/1649 While waiting for #1632 to be reviewed I completed multiple follow on issues for the new compaction service. I am submitting those as a single PR, but there is a commit for each. I can break them into multiple PRs if desired. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [accumulo] keith-turner merged pull request #1632: Fix #1609 dynamically reinitialize compaction services when config ch…
keith-turner merged pull request #1632: URL: https://github.com/apache/accumulo/pull/1632 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [accumulo] keith-turner closed issue #1609: Reinitialize compaction services when configuration changes
keith-turner closed issue #1609: URL: https://github.com/apache/accumulo/issues/1609 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [accumulo] keith-turner closed issue #1647: Compaction tests failing in TableOperationsIT
keith-turner closed issue #1647: URL: https://github.com/apache/accumulo/issues/1647 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [accumulo] milleruntime commented on issue #1647: Compaction tests failing in TableOperationsIT
milleruntime commented on issue #1647: URL: https://github.com/apache/accumulo/issues/1647#issuecomment-653182711 From @EdColeman Now sure if this helps - but there is a use case for having an empty tablet. If a user processes a data set and generates multiple tables, say data, metrics and errors - the ideal case would be that there are no errors (so an empty table). Rather than handling the logic of if the table does not exist, there must be no errors, it was simpler for their query / scan logic to have the empty table. The idea is that if there are issues, they could scan the errors to determine where things are going off the rails. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [accumulo] keith-turner commented on a change in pull request #1635: Create bulkRename method in VolumeManager
keith-turner commented on a change in pull request #1635: URL: https://github.com/apache/accumulo/pull/1635#discussion_r449212013 ## File path: server/base/src/main/java/org/apache/accumulo/server/fs/VolumeManagerImpl.java ## @@ -289,6 +295,45 @@ public boolean rename(Path path, Path newPath) throws IOException { return source.rename(path, newPath); } + @Override + public void bulkRename(Map oldToNewPathMap, int poolSize, String poolName, + String transactionId) throws IOException { +List> results = new ArrayList<>(); +SimpleThreadPool workerPool = new SimpleThreadPool(poolSize, poolName); +oldToNewPathMap.forEach((oldPath, newPath) -> results.add(workerPool.submit(() -> { + boolean success; + try { +success = rename(oldPath, newPath); + } catch (IOException e) { +// The rename could have failed because this is the second time its running (failures +// could cause this to run multiple times). +if (!exists(newPath) || exists(oldPath)) { + throw e; +} +log.debug( +"Ignoring rename exception for {} because destination already exists. orig: {} new: {}", +transactionId, oldPath, newPath, e); +success = true; + } + if (!success) { +log.error("Rename operation {} returned false. orig: {} new: {}", transactionId, oldPath, +newPath); + } else if (log.isTraceEnabled()) { Review comment: Seems like this should do the following. ```suggestion if (!success && (!exists(newPath) || exists(oldPath))) { throw new IOException("Rename operation "+transactionId+" returned false. orig: "+oldPath+" new: "+newPath); } else if (log.isTraceEnabled()) { ``` ## File path: server/base/src/main/java/org/apache/accumulo/server/fs/VolumeManager.java ## @@ -149,6 +150,11 @@ FSDataOutputStream createSyncable(Path logPath, int buffersize, short replicatio // volumes boolean rename(Path path, Path newPath) throws IOException; + // rename lots of files at once in a thread pool + // returns once all the threads have completed Review comment: could add something to the docs about this operation being idempotent, safe to call again after previous partial run of bulkRename This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [accumulo] milleruntime commented on a change in pull request #1635: Create bulkRename method in VolumeManager
milleruntime commented on a change in pull request #1635: URL: https://github.com/apache/accumulo/pull/1635#discussion_r449123192 ## File path: server/base/src/main/java/org/apache/accumulo/server/fs/VolumeManagerImpl.java ## @@ -289,6 +294,38 @@ public boolean rename(Path path, Path newPath) throws IOException { return source.rename(path, newPath); } + @Override + public List> bulkRename(Map oldToNewPathMap, + SimpleThreadPool workerPool, String transactionId) throws InterruptedException { +List> results = new ArrayList<>(); +oldToNewPathMap.forEach((oldPath, newPath) -> results.add(workerPool.submit(() -> { + boolean success; + try { +success = rename(oldPath, newPath); + } catch (IOException e) { +// The rename could have failed because this is the second time its running (failures +// could cause this to run multiple times). +if (!exists(newPath) || exists(oldPath)) { + throw e; +} +log.debug( +"Ignoring rename exception for {} because destination already exists. orig: {} new: {}", +transactionId, oldPath, newPath, e); +success = true; + } + if (!success) { +log.error("Rename operation {} returned false. orig: {} new: {}", transactionId, oldPath, Review comment: This did seem to clean up the handling in the FATE methods. Checkout my latest, I think this is ready to go now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [accumulo] milleruntime commented on a change in pull request #1635: Create bulkRename method in VolumeManager
milleruntime commented on a change in pull request #1635: URL: https://github.com/apache/accumulo/pull/1635#discussion_r449082986 ## File path: server/base/src/main/java/org/apache/accumulo/server/fs/VolumeManagerImpl.java ## @@ -289,6 +294,39 @@ public boolean rename(Path path, Path newPath) throws IOException { return source.rename(path, newPath); } + @Override + public List> bulkRename(Map oldToNewPathMap, int poolSize, + String poolName, String transactionId) throws InterruptedException { +List> results = new ArrayList<>(); +SimpleThreadPool workerPool = new SimpleThreadPool(poolSize, poolName); +oldToNewPathMap.forEach((oldPath, newPath) -> results.add(workerPool.submit(() -> { + boolean success; + try { +success = rename(oldPath, newPath); + } catch (IOException e) { +// The rename could have failed because this is the second time its running (failures +// could cause this to run multiple times). +if (!exists(newPath) || exists(oldPath)) { Review comment: I decided since we are already logging an error per file when rename returns false, we should not also throw a new error, failing the transation. It seems like this is just to rethrow any errors we may care about. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [accumulo] milleruntime commented on issue #1647: Compaction tests failing in TableOperationsIT
milleruntime commented on issue #1647: URL: https://github.com/apache/accumulo/issues/1647#issuecomment-653024418 With the limited full time contributors we have now, I would error on the side of less complexity by removing the test. With this feature not advertised, it is possible, even likely, that it is not used. As with all new features, there is new behavior so users have a different set of expectations. We can always add the complexity later if there is a legitimate use case. I would remove the ```@Test``` annotation with a comment, citing the issue number with your comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org