[PR] Admin service status server lock changes [accumulo]

2024-05-24 Thread via GitHub


EdColeman opened a new pull request, #4605:
URL: https://github.com/apache/accumulo/pull/4605

   The service host information stored in ZooKeeper changed between 2.1 and 
3.1.x.  3.1.x uses ServiceLockData and ServiceDescriptors.  These changes adopt 
the admin serviceStatus command to use the new data formats.
   
   Fixes #4603 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fixes bug in merge tablet reservations [accumulo]

2024-05-24 Thread via GitHub


keith-turner merged PR #4604:
URL: https://github.com/apache/accumulo/pull/4604


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] fixes bug in merge tablet reservations [accumulo]

2024-05-24 Thread via GitHub


keith-turner opened a new pull request, #4604:
URL: https://github.com/apache/accumulo/pull/4604

   The fate step that reserves tablets for merge was signaling it was ready 
when it was not in the case where its scan of the metadata table took 0ms.  
This was causing some ITs to fail.  The bug was introduced in #4574


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] The admin serviceStatus command needs additional changes from 2.1 [accumulo]

2024-05-24 Thread via GitHub


EdColeman opened a new issue, #4603:
URL: https://github.com/apache/accumulo/issues/4603

   The merge of 2.1 with the service status command is in 8e5d967f24 is 
complete because the lock data changed with the use of ServiceLock.  The 
changes are large enough that a separate PR will be used to fix the command.
   
   Currently the command will not print the correct host:port and the counts 
may be off because it is treating the entire service descriptor as the 
host:port.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Enable background refresh for the scan server tablet metadata cache [accumulo]

2024-05-24 Thread via GitHub


cshannon commented on code in PR #4551:
URL: https://github.com/apache/accumulo/pull/4551#discussion_r1614001721


##
server/tserver/src/main/java/org/apache/accumulo/tserver/ScanServer.java:
##
@@ -247,9 +247,31 @@ public ScanServer(ScanServerOpts opts, String[] args) {
 LOG.warn(
 "Tablet metadata caching less than one minute, may cause excessive 
scans on metadata table.");
   }
-  tabletMetadataCache =
-  Caffeine.newBuilder().expireAfterWrite(cacheExpiration, 
TimeUnit.MILLISECONDS)
-  
.scheduler(Scheduler.systemScheduler()).recordStats().build(tabletMetadataLoader);
+
+  // Get the cache refresh percentage property
+  // Value must be less than 100% as 100 or over would effectively disable 
it
+  double cacheRefreshPercentage =
+  
getConfiguration().getFraction(Property.SSERV_CACHED_TABLET_METADATA_REFRESH_PERCENT);
+  Preconditions.checkArgument(cacheRefreshPercentage < cacheExpiration,
+  "Tablet metadata cache refresh percentage is '%s' but must be less 
than 1",
+  cacheRefreshPercentage);
+
+  var builder = Caffeine.newBuilder().expireAfterWrite(cacheExpiration, 
TimeUnit.MILLISECONDS)
+  
.scheduler(Scheduler.systemScheduler()).executor(context.getScheduledExecutor())

Review Comment:
   I pushed one more update that enables metrics so we can monitor the thread 
pool to see if we need to make adjustments to the size



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Enable background refresh for the scan server tablet metadata cache [accumulo]

2024-05-24 Thread via GitHub


cshannon commented on code in PR #4551:
URL: https://github.com/apache/accumulo/pull/4551#discussion_r1613998074


##
server/tserver/src/main/java/org/apache/accumulo/tserver/ScanServer.java:
##
@@ -247,9 +247,31 @@ public ScanServer(ScanServerOpts opts, String[] args) {
 LOG.warn(
 "Tablet metadata caching less than one minute, may cause excessive 
scans on metadata table.");
   }
-  tabletMetadataCache =
-  Caffeine.newBuilder().expireAfterWrite(cacheExpiration, 
TimeUnit.MILLISECONDS)
-  
.scheduler(Scheduler.systemScheduler()).recordStats().build(tabletMetadataLoader);
+
+  // Get the cache refresh percentage property
+  // Value must be less than 100% as 100 or over would effectively disable 
it
+  double cacheRefreshPercentage =
+  
getConfiguration().getFraction(Property.SSERV_CACHED_TABLET_METADATA_REFRESH_PERCENT);
+  Preconditions.checkArgument(cacheRefreshPercentage < cacheExpiration,
+  "Tablet metadata cache refresh percentage is '%s' but must be less 
than 1",
+  cacheRefreshPercentage);
+
+  var builder = Caffeine.newBuilder().expireAfterWrite(cacheExpiration, 
TimeUnit.MILLISECONDS)
+  
.scheduler(Scheduler.systemScheduler()).executor(context.getScheduledExecutor())

Review Comment:
   I created a dedicated executor with a fixed pool size of 8 for now to keep 
it simple. It's a sync cache and we are not using it for loading just for other 
things like refresh, expiration tasks, etc. If we are not keeping up or need 
more control we could add a property to make the size configurable or bump up 
the executor size.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] adds trace logging to CompactionDriver [accumulo]

2024-05-24 Thread via GitHub


keith-turner merged PR #4578:
URL: https://github.com/apache/accumulo/pull/4578


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] ServiceLockData for monitor should include port [accumulo]

2024-05-24 Thread via GitHub


EdColeman opened a new issue, #4602:
URL: https://github.com/apache/accumulo/issues/4602

   The service lock data for the monitor should include the port in the field 
returned by getAddress.
   
   Currently in ZooKeeper:
   
   ```
   > get /accumulo/[IID]/monitor/lock/zlock#[UUID]#01
   
   
{"descriptors":[{"uuid":"87465459-9c8f-4f95-b4c6-ef3029030d05","service":"NONE","address":"localhost","group":"default"}]}
   
   ```
   
   It should look like (from a manager lock):
   
   ```
   > get /accumulo/[IID]/managers/lock/zlock#[UUID]#01
   
   
{"descriptors":[{"uuid":"f6e5ac4b-3d9a-4e90-be28-28a9fc1cc085","service":"MANAGER","address":"localhost:","group":"default"}]}
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Normalize metrics names [accumulo]

2024-05-24 Thread via GitHub


keith-turner commented on code in PR #4535:
URL: https://github.com/apache/accumulo/pull/4535#discussion_r1613900751


##
core/src/main/java/org/apache/accumulo/core/clientImpl/bulk/BulkImport.java:
##
@@ -482,12 +484,16 @@ private SortedMap 
computeMappingFromFiles(FileSystem fs, T
 if (this.executor != null) {
   executor = this.executor;
 } else if (numThreads > 0) {
-  executor = service = 
context.threadPools().getPoolBuilder("BulkImportThread")
-  .numCoreThreads(numThreads).build();
+  executor = service =
+  context.threadPools().getPoolBuilder(METRICS_POOL_PREFIX + 
"bulk.import.service")

Review Comment:
   This is a client side thread pool for bulk load, could use the following name
   
   ```suggestion
 context.threadPools().getPoolBuilder(METRICS_POOL_PREFIX + 
"client.bulk.load")
   ```



##
server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServerResourceManager.java:
##
@@ -306,25 +311,25 @@ public TabletServerResourceManager(ServerContext context, 
TabletHostingServer ts
 Property.TSERV_MINC_MAXCONCURRENT, true);
 modifyThreadPoolSizesAtRuntime(
 () -> 
context.getConfiguration().getCount(Property.TSERV_MINC_MAXCONCURRENT),
-"minor compactor", minorCompactionThreadPool);
+METRICS_TSERVER_MINOR_COMPACTOR_POOL, minorCompactionThreadPool);
 
-splitThreadPool = 
ThreadPools.getServerThreadPools().getPoolBuilder("splitter")
-.numCoreThreads(0).numMaxThreads(1).withTimeOut(1, SECONDS)
-.enableThreadPoolMetrics(enableMetrics).build();
+splitThreadPool = ThreadPools.getServerThreadPools()
+.getPoolBuilder(METRICS_POOL_PREFIX + 
"tserver.minor.compactor").numCoreThreads(0)

Review Comment:
   I think this is misnamed, should indicate split in the name and not minor 
compaction.



##
server/base/src/main/java/org/apache/accumulo/server/client/BulkImporter.java:
##
@@ -362,8 +364,8 @@ private Map> estimateSizes(final 
VolumeManager vm,
 
 final Map> ais = Collections.synchronizedMap(new 
TreeMap<>());
 
-ExecutorService threadPool = 
ThreadPools.getServerThreadPools().getPoolBuilder("estimateSizes")
-.numCoreThreads(numThreads).build();
+ExecutorService threadPool = ThreadPools.getServerThreadPools()
+
.getPoolBuilder("bulk.import.estimate.sizes.pool").numCoreThreads(numThreads).build();

Review Comment:
   Should this use the METRICS_POOL_PREFIX? 



##
server/base/src/main/java/org/apache/accumulo/server/conf/store/impl/PropCacheCaffeineImpl.java:
##
@@ -45,8 +46,8 @@ public class PropCacheCaffeineImpl implements PropCache {
   public static final int EXPIRE_MIN = 60;
   private static final Logger log = 
LoggerFactory.getLogger(PropCacheCaffeineImpl.class);
   private static final Executor executor =
-  
ThreadPools.getServerThreadPools().getPoolBuilder("caffeine-tasks").numCoreThreads(1)
-  .numMaxThreads(20).withTimeOut(60L, SECONDS).build();
+  ThreadPools.getServerThreadPools().getPoolBuilder(METRICS_POOL_PREFIX + 
"caffeine.task")

Review Comment:
   Would it be useful to include something in the name that indicates the pool 
is used for a cache that is related to properties?



##
core/src/main/java/org/apache/accumulo/core/clientImpl/TabletServerBatchReader.java:
##
@@ -71,9 +72,9 @@ protected TabletServerBatchReader(ClientContext context, 
Class scopeClass, Ta
 this.tableName = tableName;
 this.numThreads = numQueryThreads;
 
-queryThreadPool =
-context.threadPools().getPoolBuilder("batch scanner " + 
batchReaderInstance + "-")
-.numCoreThreads(numQueryThreads).build();
+queryThreadPool = context.threadPools()
+.getPoolBuilder(METRICS_POOL_PREFIX + "tserver.batch.reader.scanner" + 
batchReaderInstance)

Review Comment:
   Could drop `tserver` in the name as these threads run in client code, 
usually outside if tserver.  Could use `client` instead to indicate a client 
side thread pool.



##
server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServerResourceManager.java:
##
@@ -335,31 +340,31 @@ public TabletServerResourceManager(ServerContext context, 
TabletHostingServer ts
 Property.TSERV_ASSIGNMENT_MAXCONCURRENT, enableMetrics);
 modifyThreadPoolSizesAtRuntime(
 () -> 
context.getConfiguration().getCount(Property.TSERV_ASSIGNMENT_MAXCONCURRENT),
-"tablet assignment", assignmentPool);
+"tserver.tablet.assignment.pool", assignmentPool);

Review Comment:
   Why was this change made?  Getting lost in the diffs trying to understand 
the bigger picture.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact 

Re: [PR] Use Duration instead of long+TimeUnit in ReadOnlyTStore.unreserve [accumulo]

2024-05-24 Thread via GitHub


DomGarguilo merged PR #4371:
URL: https://github.com/apache/accumulo/pull/4371


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Normalize metrics names [accumulo]

2024-05-24 Thread via GitHub


keith-turner commented on code in PR #4535:
URL: https://github.com/apache/accumulo/pull/4535#discussion_r1613826966


##
core/src/main/java/org/apache/accumulo/core/clientImpl/TableOperationsImpl.java:
##
@@ -495,8 +496,8 @@ public void addSplits(String tableName, SortedSet 
partitionKeys)
 CountDownLatch latch = new CountDownLatch(splits.size());
 AtomicReference exception = new AtomicReference<>(null);
 
-ExecutorService executor =
-
context.threadPools().getPoolBuilder("addSplits").numCoreThreads(16).build();
+ExecutorService executor = context.threadPools()
+.getPoolBuilder(METRICS_POOL_PREFIX + 
"table.ops.add.splits").numCoreThreads(16).build();

Review Comment:
   That sounds like a nice goal.  Was asking because the description of the PR 
was focused on metrics.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] speed up merge operation for lots of tablets [accumulo]

2024-05-24 Thread via GitHub


keith-turner merged PR #4574:
URL: https://github.com/apache/accumulo/pull/4574


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] improves fate execution of in progress transactions [accumulo]

2024-05-24 Thread via GitHub


keith-turner commented on code in PR #4589:
URL: https://github.com/apache/accumulo/pull/4589#discussion_r1613822956


##
core/src/main/java/org/apache/accumulo/core/fate/MetaFateStore.java:
##
@@ -356,9 +358,9 @@ protected FateInstanceType getInstanceType() {
   }
 
   @Override
-  protected Stream getTransactions() {
+  protected Stream getTransactions(Set statuses) {

Review Comment:
   Possibly.  Its tricky, many times using streams is fine because of how the 
code is called.  In this case I don't have a good sense of if the stream is 
impacting things significantly.   It could impact an operation that wants to 
zip through all of the fate ids as quickly as possible.  For running fate ops, 
suspect list the fate op ids is not the bottleneck.  However there is other 
code that uses this that is not running fate ops, but is examining them so not 
sure about that code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Enable background refresh for the scan server tablet metadata cache [accumulo]

2024-05-24 Thread via GitHub


keith-turner commented on code in PR #4551:
URL: https://github.com/apache/accumulo/pull/4551#discussion_r1613816597


##
server/tserver/src/main/java/org/apache/accumulo/tserver/ScanServer.java:
##
@@ -247,9 +247,31 @@ public ScanServer(ScanServerOpts opts, String[] args) {
 LOG.warn(
 "Tablet metadata caching less than one minute, may cause excessive 
scans on metadata table.");
   }
-  tabletMetadataCache =
-  Caffeine.newBuilder().expireAfterWrite(cacheExpiration, 
TimeUnit.MILLISECONDS)
-  
.scheduler(Scheduler.systemScheduler()).recordStats().build(tabletMetadataLoader);
+
+  // Get the cache refresh percentage property
+  // Value must be less than 100% as 100 or over would effectively disable 
it
+  double cacheRefreshPercentage =
+  
getConfiguration().getFraction(Property.SSERV_CACHED_TABLET_METADATA_REFRESH_PERCENT);
+  Preconditions.checkArgument(cacheRefreshPercentage < cacheExpiration,
+  "Tablet metadata cache refresh percentage is '%s' but must be less 
than 1",
+  cacheRefreshPercentage);
+
+  var builder = Caffeine.newBuilder().expireAfterWrite(cacheExpiration, 
TimeUnit.MILLISECONDS)
+  
.scheduler(Scheduler.systemScheduler()).executor(context.getScheduledExecutor())

Review Comment:
   A dedicated thread pool for the cache would be nice.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] speed up merge operation for lots of tablets [accumulo]

2024-05-24 Thread via GitHub


keith-turner commented on PR #4574:
URL: https://github.com/apache/accumulo/pull/4574#issuecomment-2130056612

   A lot of the changes that were in this PR are now in the commits 
219d4f224e1d97d2bd7cd5dec5ec5968e89facef and 
ef213b520b2485f356cec7fd65f3ff4b66f43840


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add service status as command option to admin command [accumulo]

2024-05-24 Thread via GitHub


EdColeman merged PR #4567:
URL: https://github.com/apache/accumulo/pull/4567


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] improves fate execution of in progress transactions [accumulo]

2024-05-24 Thread via GitHub


keith-turner merged PR #4589:
URL: https://github.com/apache/accumulo/pull/4589


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Log when a tablet has been in the migrating state for an extended period [accumulo]

2024-05-24 Thread via GitHub


dlmarion closed issue #4539: Log when a tablet has been in the migrating state 
for an extended period
URL: https://github.com/apache/accumulo/issues/4539


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Log message when Tablet has been unloading for over 15 minutes [accumulo]

2024-05-24 Thread via GitHub


dlmarion merged PR #4558:
URL: https://github.com/apache/accumulo/pull/4558


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] adds iterator for filtering on fate key type [accumulo]

2024-05-24 Thread via GitHub


keith-turner merged PR #4590:
URL: https://github.com/apache/accumulo/pull/4590


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Add IT to verify Scan Servers remove their references on shutdown [accumulo]

2024-05-24 Thread via GitHub


dlmarion commented on issue #4594:
URL: https://github.com/apache/accumulo/issues/4594#issuecomment-2129945412

   An IT already exists for this, see 
https://github.com/apache/accumulo/blob/2.1/test/src/main/java/org/apache/accumulo/test/ScanServerMetadataEntriesIT.java#L140


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add metrics for compactor entries read and write [accumulo]

2024-05-24 Thread via GitHub


DomGarguilo closed pull request #4572: Add metrics for compactor entries read 
and write
URL: https://github.com/apache/accumulo/pull/4572


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] replaces sorted map w/ list for key vals in tablet metadata [accumulo]

2024-05-24 Thread via GitHub


keith-turner merged PR #4600:
URL: https://github.com/apache/accumulo/pull/4600


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add new scan metrics to flaky list in MetricsIT [accumulo]

2024-05-24 Thread via GitHub


EdColeman merged PR #4601:
URL: https://github.com/apache/accumulo/pull/4601


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Add new scan metrics to flaky list in MetricsIT [accumulo]

2024-05-24 Thread via GitHub


EdColeman opened a new pull request, #4601:
URL: https://github.com/apache/accumulo/pull/4601

- METRICS_SCAN_RESERVATION_WRITEOUT_TIMER
- METRICS_SCAN_RESERVATION_CONFLICT_COUNTER


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] accumulo-cluster does not start a CompactionCoordinator on different host [accumulo]

2024-05-24 Thread via GitHub


dlmarion commented on issue #4597:
URL: https://github.com/apache/accumulo/issues/4597#issuecomment-2129819681

   I was unable to replicate this. I modified `accumulo-cluster` locally to 
emit debug output and so that it would always take the `ssh` path. I set the 
hosts in cluster.yaml to the public IP address of my dev box, but set my 
manager address to `localhost`. I saw the ssh command come across and 
compaction-coordinator logs appeared in the logs directory (Accumulo wouldn't 
start because I had ZK and Hadoop down). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Add ability to filter out metrics [accumulo]

2024-05-24 Thread via GitHub


keith-turner commented on issue #4599:
URL: https://github.com/apache/accumulo/issues/4599#issuecomment-2129582096

   > We don't necessarily have to implement something to provide some base 
functionality. A user can extend one of our existing implementations or create 
their own to provide this functionality.
   
   Wondering if it would be useful to provide some kind of help in the SPI for 
building micrometer filters related to Accumulo metrics. One possible way to do 
this would be to have constants related to metrics in the SPI.  This could 
enable writing a  MeterRegistryFactory that manipulates Accumulo metrics using 
those constants.   Could have something like the following.
   
   ```java
   package org.apache.accumulo.core.spi.metrics;
   
   class MetricsConstants {
   
  static class Tags {
  String INSTANCE_TAG = "instance.name";
 }
   
 static class Meters {
  String COMPACTIONS_QUEUED = "...";
 }
   }
   
   ```
   
   With the above then could have  a scenario like the following.
   
* Custom MeterRegistryFactory is written against 3.1 that drops the 
instance name tag.  Its impl uses the `INSTANCE_TAG` constant from the SPI.
* Accumulo 4.0 makes a major changes to the instance name tag that user 
need to consider.  This change deletes  INSTANCE_TAG constant from the SPI.
* The custom MeterRegistryFactory written against 3.1 no longer compiles 
against 4.0 forcing an investigation of the change.
   
   Completely unrelated to metrics, but if this approach of putting metrics 
related constants in the SPI works then it could also potentially work for 
Accumulo's properties.  Could have constants related to those in the SPI maybe.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Log message when Tablet has been unloading for over 15 minutes [accumulo]

2024-05-24 Thread via GitHub


dlmarion commented on code in PR #4558:
URL: https://github.com/apache/accumulo/pull/4558#discussion_r1613492443


##
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java:
##
@@ -164,9 +167,10 @@ public long getDataSourceDeletions() {
   }
 
   private enum CloseState {
-OPEN, CLOSING, CLOSED, COMPLETE
+OPEN, REQUESTED, CLOSING, CLOSED, COMPLETE
   }
 
+  private volatile long closeRequestTime = 0;

Review Comment:
   Removed volatile modifier in 3fa4f87



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Log message when Tablet has been unloading for over 15 minutes [accumulo]

2024-05-24 Thread via GitHub


dlmarion commented on code in PR #4558:
URL: https://github.com/apache/accumulo/pull/4558#discussion_r1613492183


##
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java:
##
@@ -905,6 +909,19 @@ public void close(boolean saveState) throws IOException {
   void initiateClose(boolean saveState) {
 log.trace("initiateClose(saveState={}) {}", saveState, getExtent());
 
+synchronized (this) {
+  if (closeState == CloseState.OPEN) {
+closeRequestTime = System.nanoTime();
+  } else if (closeRequestTime != 0) {
+long runningTime = Duration.ofNanos(System.nanoTime() - 
closeRequestTime).toMinutes();
+if (runningTime >= 15) {
+  DEDUPE_LOGGER.info("Tablet {} close requested again, but has been 
closing for {} minutes",
+  this.extent, runningTime);
+}
+  }
+  closeState = CloseState.REQUESTED;

Review Comment:
   Change made in 3fa4f87



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Log message when Tablet has been unloading for over 15 minutes [accumulo]

2024-05-24 Thread via GitHub


dlmarion commented on code in PR #4558:
URL: https://github.com/apache/accumulo/pull/4558#discussion_r1613491599


##
core/src/main/java/org/apache/accumulo/core/logging/ConditionalLogger.java:
##
@@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.logging;
+
+import java.time.Duration;
+import java.util.Arrays;
+import java.util.List;
+import java.util.concurrent.ConcurrentMap;
+import java.util.function.BiFunction;
+
+import org.apache.accumulo.core.util.Pair;
+import org.slf4j.Logger;
+import org.slf4j.Marker;
+import org.slf4j.event.Level;
+import org.slf4j.helpers.AbstractLogger;
+
+import com.github.benmanes.caffeine.cache.Cache;
+import com.github.benmanes.caffeine.cache.Caffeine;
+
+/**
+ * Logger that wraps another Logger and only emits a log message once per the 
supplied duration.
+ *
+ */
+public abstract class ConditionalLogger extends AbstractLogger {
+
+  private static final long serialVersionUID = 1L;
+
+  /**
+   * A Logger implementation that will log a message at the supplied elevated 
level if it has not
+   * been seen in the supplied duration. For repeat occurrences the message 
will be logged at the
+   * level used in code (which is likely a lower level). Note that the first 
log message will be
+   * logged at the elevated level because it has not been seen before.
+   */
+  public static class EscalatingLogger extends DeduplicatingLogger {
+
+private static final long serialVersionUID = 1L;
+private final Level elevatedLevel;
+
+public EscalatingLogger(Logger log, Duration threshold, Level 
elevatedLevel) {
+  super(log, threshold);
+  this.elevatedLevel = elevatedLevel;
+}
+
+@Override
+protected void handleNormalizedLoggingCall(Level level, Marker marker, 
String messagePattern,
+Object[] arguments, Throwable throwable) {
+
+  if (arguments == null) {
+arguments = new Object[0];
+  }
+  if (!condition.apply(messagePattern, Arrays.asList(arguments))) {
+
delegate.atLevel(level).addMarker(marker).setCause(throwable).log(messagePattern,
+arguments);
+  } else {
+
delegate.atLevel(elevatedLevel).addMarker(marker).setCause(throwable).log(messagePattern,
+arguments);
+  }
+
+}
+
+  }
+
+  /**
+   * A Logger implementation that will suppress duplicate messages within the 
supplied duration.
+   */
+  public static class DeduplicatingLogger extends ConditionalLogger {
+
+private static final long serialVersionUID = 1L;
+
+public DeduplicatingLogger(Logger log, Duration threshold) {
+  super(log, new BiFunction<>() {
+
+private final Cache>,Boolean> cache =
+
Caffeine.newBuilder().expireAfterWrite(threshold).maximumSize(250).build();

Review Comment:
   I made the cache size configurable in 3fa4f87, and set it to 1000 in the 
stuck logger in Tablet



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Log message when Tablet has been unloading for over 15 minutes [accumulo]

2024-05-24 Thread via GitHub


dlmarion commented on code in PR #4558:
URL: https://github.com/apache/accumulo/pull/4558#discussion_r1613490374


##
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java:
##
@@ -905,6 +909,19 @@ public void close(boolean saveState) throws IOException {
   void initiateClose(boolean saveState) {
 log.trace("initiateClose(saveState={}) {}", saveState, getExtent());
 
+synchronized (this) {
+  if (closeState == CloseState.OPEN) {
+closeRequestTime = System.nanoTime();
+  } else if (closeRequestTime != 0) {

Review Comment:
   Implemented this change in 3fa4f87



##
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java:
##
@@ -141,6 +143,7 @@
  */
 public class Tablet extends TabletBase {
   private static final Logger log = LoggerFactory.getLogger(Tablet.class);
+  private static final Logger DEDUPE_LOGGER = new DeduplicatingLogger(log, 
Duration.ofMinutes(5));

Review Comment:
   Renamed logger in 3fa4f87



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] handles failure in compaction planner [accumulo]

2024-05-24 Thread via GitHub


dlmarion commented on code in PR #4586:
URL: https://github.com/apache/accumulo/pull/4586#discussion_r1613446675


##
server/base/src/main/java/org/apache/accumulo/server/compaction/CompactionJobGenerator.java:
##
@@ -162,23 +166,29 @@ public Map getExecutionHints() {
 return dispatcher.dispatch(dispatchParams).getService();
   }
 
+  private Level getLevel(TableId tableId, CompactionServiceId serviceId,

Review Comment:
   Curious if you can use the EscalatingLogger ( or some other variant of 
ConditionalLogger) that's included in #4558 .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] improves fate execution of in progress transactions [accumulo]

2024-05-24 Thread via GitHub


dlmarion commented on code in PR #4589:
URL: https://github.com/apache/accumulo/pull/4589#discussion_r1613395827


##
core/src/main/java/org/apache/accumulo/core/fate/MetaFateStore.java:
##
@@ -356,9 +358,9 @@ protected FateInstanceType getInstanceType() {
   }
 
   @Override
-  protected Stream getTransactions() {
+  protected Stream getTransactions(Set statuses) {

Review Comment:
   If the changes in this PR are for performance, I wonder if we should turn 
this stream back into a for loop.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Modified FATE to allow transactions to run to completion without interruption [accumulo]

2024-05-24 Thread via GitHub


dlmarion closed pull request #3852: Modified FATE to allow transactions to run 
to completion without interruption
URL: https://github.com/apache/accumulo/pull/3852


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Modified FATE to allow transactions to run to completion without interruption [accumulo]

2024-05-24 Thread via GitHub


dlmarion commented on PR #3852:
URL: https://github.com/apache/accumulo/pull/3852#issuecomment-2129358394

   Superceded by #4589 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Add ability to filter out metrics [accumulo]

2024-05-24 Thread via GitHub


dlmarion commented on issue #4599:
URL: https://github.com/apache/accumulo/issues/4599#issuecomment-2129336830

   So, that's a fair point. We don't necessarily have to implement something to 
provide some base functionality. A user can extend one of our existing 
implementations or create their own to provide this functionality.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fixed NPE in ScanServerMetrics [accumulo]

2024-05-24 Thread via GitHub


dlmarion merged PR #4598:
URL: https://github.com/apache/accumulo/pull/4598


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org