[GitHub] [accumulo] keith-turner commented on issue #1647: Compaction tests failing in TableOperationsIT

2020-07-02 Thread GitBox


keith-turner commented on issue #1647:
URL: https://github.com/apache/accumulo/issues/1647#issuecomment-653278435


   I was thought of a fairly simple way to implement empty compactions w/o 
riddling the code w/ special cases for the empty set.  Using a set with a 
special URI for this case instead of implementing special handling for the 
empty set made the changes quite small.  I made the changes in the following 
branch and the test pass now.
   
   https://github.com/keith-turner/accumulo/tree/accumulo-1647
   
   These changes build on the changes in #1649.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [accumulo] keith-turner edited a comment on issue #1647: Compaction tests failing in TableOperationsIT

2020-07-02 Thread GitBox


keith-turner edited a comment on issue #1647:
URL: https://github.com/apache/accumulo/issues/1647#issuecomment-653234692


   > My main concern is inconsistent behavior. Iterators that generate data 
will work when there is existing files... but not work when there isn't. 
   
   That is a good point.  When I was thinking about this, I thought the 
generator iterators would only generate data when the source iterator was 
empty, but I suppose that does not have to be the case.  If a tablet has data, 
then a generator iterator would have to merge the generated data with existing 
data while respecting sorting.  Also the generator iterator would need to only 
generate data within the bounds of the tablet.  Actually writing one of these 
that is correct seems non-trivial.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [accumulo] keith-turner commented on issue #1647: Compaction tests failing in TableOperationsIT

2020-07-02 Thread GitBox


keith-turner commented on issue #1647:
URL: https://github.com/apache/accumulo/issues/1647#issuecomment-653234692


   > My main concern is inconsistent behavior. Iterators that generate data 
will work when there is existing files... but not work when there isn't. 
   
   That is a good point.  When I was thinking about this, I thought the 
generator iterators would only generate data when the source iterator was 
empty, but I suppose that does not have to be the case.  If a tablet has data, 
then a generator iterator would have to merge the generated data with existing 
data while respecting sorting.  Also the generator iterator would need to only 
generate data within the bounds of the table.  Actually writing one of these 
that is correct seems non-trivial.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Accumulo-Master - Build # 3115 - Fixed

2020-07-02 Thread Apache Jenkins Server
The Apache Jenkins build system has built Accumulo-Master (build #3115)

Status: Fixed

Check console output at https://builds.apache.org/job/Accumulo-Master/3115/ to 
view the results.

[GitHub] [accumulo] ctubbsii commented on issue #1647: Compaction tests failing in TableOperationsIT

2020-07-02 Thread GitBox


ctubbsii commented on issue #1647:
URL: https://github.com/apache/accumulo/issues/1647#issuecomment-653226825


   My main concern is inconsistent behavior. Iterators that generate data will 
work when there is existing files... but not work when there isn't. This is 
very unintuitive, and the actual state that the behavior turns on is invisible 
to the end user who executes a compaction.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [accumulo] keith-turner opened a new pull request #1649: Addresses three compaction service follow on issues.

2020-07-02 Thread GitBox


keith-turner opened a new pull request #1649:
URL: https://github.com/apache/accumulo/pull/1649


   While waiting for #1632 to be reviewed  I completed multiple follow on 
issues for the new compaction service.  I am submitting those as a single PR, 
but there is a commit for each. I can break them into multiple PRs if desired.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [accumulo] keith-turner merged pull request #1632: Fix #1609 dynamically reinitialize compaction services when config ch…

2020-07-02 Thread GitBox


keith-turner merged pull request #1632:
URL: https://github.com/apache/accumulo/pull/1632


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [accumulo] keith-turner closed issue #1609: Reinitialize compaction services when configuration changes

2020-07-02 Thread GitBox


keith-turner closed issue #1609:
URL: https://github.com/apache/accumulo/issues/1609


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [accumulo] keith-turner closed issue #1647: Compaction tests failing in TableOperationsIT

2020-07-02 Thread GitBox


keith-turner closed issue #1647:
URL: https://github.com/apache/accumulo/issues/1647


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [accumulo] milleruntime commented on issue #1647: Compaction tests failing in TableOperationsIT

2020-07-02 Thread GitBox


milleruntime commented on issue #1647:
URL: https://github.com/apache/accumulo/issues/1647#issuecomment-653182711


   From @EdColeman  
   Now sure if this helps - but there is a use case for having an empty tablet. 
 If a user processes a data set and generates multiple tables, say data, 
metrics and errors - the ideal case would be that there are no errors (so an 
empty table).  Rather than handling the logic of if the table does not exist, 
there must be no errors, it was simpler for their query / scan logic to have 
the empty table.  The idea is that if there are issues, they could scan the 
errors to determine where things are going off the rails.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [accumulo] keith-turner commented on a change in pull request #1635: Create bulkRename method in VolumeManager

2020-07-02 Thread GitBox


keith-turner commented on a change in pull request #1635:
URL: https://github.com/apache/accumulo/pull/1635#discussion_r449212013



##
File path: 
server/base/src/main/java/org/apache/accumulo/server/fs/VolumeManagerImpl.java
##
@@ -289,6 +295,45 @@ public boolean rename(Path path, Path newPath) throws 
IOException {
 return source.rename(path, newPath);
   }
 
+  @Override
+  public void bulkRename(Map oldToNewPathMap, int poolSize, String 
poolName,
+  String transactionId) throws IOException {
+List> results = new ArrayList<>();
+SimpleThreadPool workerPool = new SimpleThreadPool(poolSize, poolName);
+oldToNewPathMap.forEach((oldPath, newPath) -> 
results.add(workerPool.submit(() -> {
+  boolean success;
+  try {
+success = rename(oldPath, newPath);
+  } catch (IOException e) {
+// The rename could have failed because this is the second time its 
running (failures
+// could cause this to run multiple times).
+if (!exists(newPath) || exists(oldPath)) {
+  throw e;
+}
+log.debug(
+"Ignoring rename exception for {} because destination already 
exists. orig: {} new: {}",
+transactionId, oldPath, newPath, e);
+success = true;
+  }
+  if (!success) {
+log.error("Rename operation {} returned false. orig: {} new: {}", 
transactionId, oldPath,
+newPath);
+  } else if (log.isTraceEnabled()) {

Review comment:
   Seems like this should do the following.
   
   ```suggestion
 if (!success && (!exists(newPath) || exists(oldPath))) {
 throw new IOException("Rename operation "+transactionId+" returned 
false. orig: "+oldPath+" new: "+newPath);
 } else if (log.isTraceEnabled()) {
   ```

##
File path: 
server/base/src/main/java/org/apache/accumulo/server/fs/VolumeManager.java
##
@@ -149,6 +150,11 @@ FSDataOutputStream createSyncable(Path logPath, int 
buffersize, short replicatio
   // volumes
   boolean rename(Path path, Path newPath) throws IOException;
 
+  // rename lots of files at once in a thread pool
+  // returns once all the threads have completed

Review comment:
   could add something to the docs about this operation being idempotent, 
safe to call again after previous partial run of bulkRename





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [accumulo] milleruntime commented on a change in pull request #1635: Create bulkRename method in VolumeManager

2020-07-02 Thread GitBox


milleruntime commented on a change in pull request #1635:
URL: https://github.com/apache/accumulo/pull/1635#discussion_r449123192



##
File path: 
server/base/src/main/java/org/apache/accumulo/server/fs/VolumeManagerImpl.java
##
@@ -289,6 +294,38 @@ public boolean rename(Path path, Path newPath) throws 
IOException {
 return source.rename(path, newPath);
   }
 
+  @Override
+  public List> bulkRename(Map oldToNewPathMap,
+  SimpleThreadPool workerPool, String transactionId) throws 
InterruptedException {
+List> results = new ArrayList<>();
+oldToNewPathMap.forEach((oldPath, newPath) -> 
results.add(workerPool.submit(() -> {
+  boolean success;
+  try {
+success = rename(oldPath, newPath);
+  } catch (IOException e) {
+// The rename could have failed because this is the second time its 
running (failures
+// could cause this to run multiple times).
+if (!exists(newPath) || exists(oldPath)) {
+  throw e;
+}
+log.debug(
+"Ignoring rename exception for {} because destination already 
exists. orig: {} new: {}",
+transactionId, oldPath, newPath, e);
+success = true;
+  }
+  if (!success) {
+log.error("Rename operation {} returned false. orig: {} new: {}", 
transactionId, oldPath,

Review comment:
   This did seem to clean up the handling in the FATE methods.  Checkout my 
latest, I think this is ready to go now.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [accumulo] milleruntime commented on a change in pull request #1635: Create bulkRename method in VolumeManager

2020-07-02 Thread GitBox


milleruntime commented on a change in pull request #1635:
URL: https://github.com/apache/accumulo/pull/1635#discussion_r449082986



##
File path: 
server/base/src/main/java/org/apache/accumulo/server/fs/VolumeManagerImpl.java
##
@@ -289,6 +294,39 @@ public boolean rename(Path path, Path newPath) throws 
IOException {
 return source.rename(path, newPath);
   }
 
+  @Override
+  public List> bulkRename(Map oldToNewPathMap, int 
poolSize,
+  String poolName, String transactionId) throws InterruptedException {
+List> results = new ArrayList<>();
+SimpleThreadPool workerPool = new SimpleThreadPool(poolSize, poolName);
+oldToNewPathMap.forEach((oldPath, newPath) -> 
results.add(workerPool.submit(() -> {
+  boolean success;
+  try {
+success = rename(oldPath, newPath);
+  } catch (IOException e) {
+// The rename could have failed because this is the second time its 
running (failures
+// could cause this to run multiple times).
+if (!exists(newPath) || exists(oldPath)) {

Review comment:
   I decided since we are already logging an error per file when rename 
returns false, we should not also throw a new error, failing the transation.  
It seems like this is just to rethrow any errors we may care about.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [accumulo] milleruntime commented on issue #1647: Compaction tests failing in TableOperationsIT

2020-07-02 Thread GitBox


milleruntime commented on issue #1647:
URL: https://github.com/apache/accumulo/issues/1647#issuecomment-653024418


   With the limited full time contributors we have now, I would error on the 
side of less complexity by removing the test.  With this feature not 
advertised, it is possible, even likely, that it is not used.  As with all new 
features, there is new behavior so users have a different set of expectations.  
We can always add the complexity later if there is a legitimate use case.  I 
would remove the ```@Test``` annotation with a comment, citing the issue number 
with your comments.  



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org