[PR] Admin service status server lock changes [accumulo]
EdColeman opened a new pull request, #4605: URL: https://github.com/apache/accumulo/pull/4605 The service host information stored in ZooKeeper changed between 2.1 and 3.1.x. 3.1.x uses ServiceLockData and ServiceDescriptors. These changes adopt the admin serviceStatus command to use the new data formats. Fixes #4603 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] fixes bug in merge tablet reservations [accumulo]
keith-turner merged PR #4604: URL: https://github.com/apache/accumulo/pull/4604 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] fixes bug in merge tablet reservations [accumulo]
keith-turner opened a new pull request, #4604: URL: https://github.com/apache/accumulo/pull/4604 The fate step that reserves tablets for merge was signaling it was ready when it was not in the case where its scan of the metadata table took 0ms. This was causing some ITs to fail. The bug was introduced in #4574 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] The admin serviceStatus command needs additional changes from 2.1 [accumulo]
EdColeman opened a new issue, #4603: URL: https://github.com/apache/accumulo/issues/4603 The merge of 2.1 with the service status command is in 8e5d967f24 is complete because the lock data changed with the use of ServiceLock. The changes are large enough that a separate PR will be used to fix the command. Currently the command will not print the correct host:port and the counts may be off because it is treating the entire service descriptor as the host:port. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Enable background refresh for the scan server tablet metadata cache [accumulo]
cshannon commented on code in PR #4551: URL: https://github.com/apache/accumulo/pull/4551#discussion_r1614001721 ## server/tserver/src/main/java/org/apache/accumulo/tserver/ScanServer.java: ## @@ -247,9 +247,31 @@ public ScanServer(ScanServerOpts opts, String[] args) { LOG.warn( "Tablet metadata caching less than one minute, may cause excessive scans on metadata table."); } - tabletMetadataCache = - Caffeine.newBuilder().expireAfterWrite(cacheExpiration, TimeUnit.MILLISECONDS) - .scheduler(Scheduler.systemScheduler()).recordStats().build(tabletMetadataLoader); + + // Get the cache refresh percentage property + // Value must be less than 100% as 100 or over would effectively disable it + double cacheRefreshPercentage = + getConfiguration().getFraction(Property.SSERV_CACHED_TABLET_METADATA_REFRESH_PERCENT); + Preconditions.checkArgument(cacheRefreshPercentage < cacheExpiration, + "Tablet metadata cache refresh percentage is '%s' but must be less than 1", + cacheRefreshPercentage); + + var builder = Caffeine.newBuilder().expireAfterWrite(cacheExpiration, TimeUnit.MILLISECONDS) + .scheduler(Scheduler.systemScheduler()).executor(context.getScheduledExecutor()) Review Comment: I pushed one more update that enables metrics so we can monitor the thread pool to see if we need to make adjustments to the size -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Enable background refresh for the scan server tablet metadata cache [accumulo]
cshannon commented on code in PR #4551: URL: https://github.com/apache/accumulo/pull/4551#discussion_r1613998074 ## server/tserver/src/main/java/org/apache/accumulo/tserver/ScanServer.java: ## @@ -247,9 +247,31 @@ public ScanServer(ScanServerOpts opts, String[] args) { LOG.warn( "Tablet metadata caching less than one minute, may cause excessive scans on metadata table."); } - tabletMetadataCache = - Caffeine.newBuilder().expireAfterWrite(cacheExpiration, TimeUnit.MILLISECONDS) - .scheduler(Scheduler.systemScheduler()).recordStats().build(tabletMetadataLoader); + + // Get the cache refresh percentage property + // Value must be less than 100% as 100 or over would effectively disable it + double cacheRefreshPercentage = + getConfiguration().getFraction(Property.SSERV_CACHED_TABLET_METADATA_REFRESH_PERCENT); + Preconditions.checkArgument(cacheRefreshPercentage < cacheExpiration, + "Tablet metadata cache refresh percentage is '%s' but must be less than 1", + cacheRefreshPercentage); + + var builder = Caffeine.newBuilder().expireAfterWrite(cacheExpiration, TimeUnit.MILLISECONDS) + .scheduler(Scheduler.systemScheduler()).executor(context.getScheduledExecutor()) Review Comment: I created a dedicated executor with a fixed pool size of 8 for now to keep it simple. It's a sync cache and we are not using it for loading just for other things like refresh, expiration tasks, etc. If we are not keeping up or need more control we could add a property to make the size configurable or bump up the executor size. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] adds trace logging to CompactionDriver [accumulo]
keith-turner merged PR #4578: URL: https://github.com/apache/accumulo/pull/4578 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] ServiceLockData for monitor should include port [accumulo]
EdColeman opened a new issue, #4602: URL: https://github.com/apache/accumulo/issues/4602 The service lock data for the monitor should include the port in the field returned by getAddress. Currently in ZooKeeper: ``` > get /accumulo/[IID]/monitor/lock/zlock#[UUID]#01 {"descriptors":[{"uuid":"87465459-9c8f-4f95-b4c6-ef3029030d05","service":"NONE","address":"localhost","group":"default"}]} ``` It should look like (from a manager lock): ``` > get /accumulo/[IID]/managers/lock/zlock#[UUID]#01 {"descriptors":[{"uuid":"f6e5ac4b-3d9a-4e90-be28-28a9fc1cc085","service":"MANAGER","address":"localhost:","group":"default"}]} ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Normalize metrics names [accumulo]
keith-turner commented on code in PR #4535: URL: https://github.com/apache/accumulo/pull/4535#discussion_r1613900751 ## core/src/main/java/org/apache/accumulo/core/clientImpl/bulk/BulkImport.java: ## @@ -482,12 +484,16 @@ private SortedMap computeMappingFromFiles(FileSystem fs, T if (this.executor != null) { executor = this.executor; } else if (numThreads > 0) { - executor = service = context.threadPools().getPoolBuilder("BulkImportThread") - .numCoreThreads(numThreads).build(); + executor = service = + context.threadPools().getPoolBuilder(METRICS_POOL_PREFIX + "bulk.import.service") Review Comment: This is a client side thread pool for bulk load, could use the following name ```suggestion context.threadPools().getPoolBuilder(METRICS_POOL_PREFIX + "client.bulk.load") ``` ## server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServerResourceManager.java: ## @@ -306,25 +311,25 @@ public TabletServerResourceManager(ServerContext context, TabletHostingServer ts Property.TSERV_MINC_MAXCONCURRENT, true); modifyThreadPoolSizesAtRuntime( () -> context.getConfiguration().getCount(Property.TSERV_MINC_MAXCONCURRENT), -"minor compactor", minorCompactionThreadPool); +METRICS_TSERVER_MINOR_COMPACTOR_POOL, minorCompactionThreadPool); -splitThreadPool = ThreadPools.getServerThreadPools().getPoolBuilder("splitter") -.numCoreThreads(0).numMaxThreads(1).withTimeOut(1, SECONDS) -.enableThreadPoolMetrics(enableMetrics).build(); +splitThreadPool = ThreadPools.getServerThreadPools() +.getPoolBuilder(METRICS_POOL_PREFIX + "tserver.minor.compactor").numCoreThreads(0) Review Comment: I think this is misnamed, should indicate split in the name and not minor compaction. ## server/base/src/main/java/org/apache/accumulo/server/client/BulkImporter.java: ## @@ -362,8 +364,8 @@ private Map> estimateSizes(final VolumeManager vm, final Map> ais = Collections.synchronizedMap(new TreeMap<>()); -ExecutorService threadPool = ThreadPools.getServerThreadPools().getPoolBuilder("estimateSizes") -.numCoreThreads(numThreads).build(); +ExecutorService threadPool = ThreadPools.getServerThreadPools() + .getPoolBuilder("bulk.import.estimate.sizes.pool").numCoreThreads(numThreads).build(); Review Comment: Should this use the METRICS_POOL_PREFIX? ## server/base/src/main/java/org/apache/accumulo/server/conf/store/impl/PropCacheCaffeineImpl.java: ## @@ -45,8 +46,8 @@ public class PropCacheCaffeineImpl implements PropCache { public static final int EXPIRE_MIN = 60; private static final Logger log = LoggerFactory.getLogger(PropCacheCaffeineImpl.class); private static final Executor executor = - ThreadPools.getServerThreadPools().getPoolBuilder("caffeine-tasks").numCoreThreads(1) - .numMaxThreads(20).withTimeOut(60L, SECONDS).build(); + ThreadPools.getServerThreadPools().getPoolBuilder(METRICS_POOL_PREFIX + "caffeine.task") Review Comment: Would it be useful to include something in the name that indicates the pool is used for a cache that is related to properties? ## core/src/main/java/org/apache/accumulo/core/clientImpl/TabletServerBatchReader.java: ## @@ -71,9 +72,9 @@ protected TabletServerBatchReader(ClientContext context, Class scopeClass, Ta this.tableName = tableName; this.numThreads = numQueryThreads; -queryThreadPool = -context.threadPools().getPoolBuilder("batch scanner " + batchReaderInstance + "-") -.numCoreThreads(numQueryThreads).build(); +queryThreadPool = context.threadPools() +.getPoolBuilder(METRICS_POOL_PREFIX + "tserver.batch.reader.scanner" + batchReaderInstance) Review Comment: Could drop `tserver` in the name as these threads run in client code, usually outside if tserver. Could use `client` instead to indicate a client side thread pool. ## server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServerResourceManager.java: ## @@ -335,31 +340,31 @@ public TabletServerResourceManager(ServerContext context, TabletHostingServer ts Property.TSERV_ASSIGNMENT_MAXCONCURRENT, enableMetrics); modifyThreadPoolSizesAtRuntime( () -> context.getConfiguration().getCount(Property.TSERV_ASSIGNMENT_MAXCONCURRENT), -"tablet assignment", assignmentPool); +"tserver.tablet.assignment.pool", assignmentPool); Review Comment: Why was this change made? Getting lost in the diffs trying to understand the bigger picture. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact
Re: [PR] Use Duration instead of long+TimeUnit in ReadOnlyTStore.unreserve [accumulo]
DomGarguilo merged PR #4371: URL: https://github.com/apache/accumulo/pull/4371 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Normalize metrics names [accumulo]
keith-turner commented on code in PR #4535: URL: https://github.com/apache/accumulo/pull/4535#discussion_r1613826966 ## core/src/main/java/org/apache/accumulo/core/clientImpl/TableOperationsImpl.java: ## @@ -495,8 +496,8 @@ public void addSplits(String tableName, SortedSet partitionKeys) CountDownLatch latch = new CountDownLatch(splits.size()); AtomicReference exception = new AtomicReference<>(null); -ExecutorService executor = - context.threadPools().getPoolBuilder("addSplits").numCoreThreads(16).build(); +ExecutorService executor = context.threadPools() +.getPoolBuilder(METRICS_POOL_PREFIX + "table.ops.add.splits").numCoreThreads(16).build(); Review Comment: That sounds like a nice goal. Was asking because the description of the PR was focused on metrics. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] speed up merge operation for lots of tablets [accumulo]
keith-turner merged PR #4574: URL: https://github.com/apache/accumulo/pull/4574 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] improves fate execution of in progress transactions [accumulo]
keith-turner commented on code in PR #4589: URL: https://github.com/apache/accumulo/pull/4589#discussion_r1613822956 ## core/src/main/java/org/apache/accumulo/core/fate/MetaFateStore.java: ## @@ -356,9 +358,9 @@ protected FateInstanceType getInstanceType() { } @Override - protected Stream getTransactions() { + protected Stream getTransactions(Set statuses) { Review Comment: Possibly. Its tricky, many times using streams is fine because of how the code is called. In this case I don't have a good sense of if the stream is impacting things significantly. It could impact an operation that wants to zip through all of the fate ids as quickly as possible. For running fate ops, suspect list the fate op ids is not the bottleneck. However there is other code that uses this that is not running fate ops, but is examining them so not sure about that code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Enable background refresh for the scan server tablet metadata cache [accumulo]
keith-turner commented on code in PR #4551: URL: https://github.com/apache/accumulo/pull/4551#discussion_r1613816597 ## server/tserver/src/main/java/org/apache/accumulo/tserver/ScanServer.java: ## @@ -247,9 +247,31 @@ public ScanServer(ScanServerOpts opts, String[] args) { LOG.warn( "Tablet metadata caching less than one minute, may cause excessive scans on metadata table."); } - tabletMetadataCache = - Caffeine.newBuilder().expireAfterWrite(cacheExpiration, TimeUnit.MILLISECONDS) - .scheduler(Scheduler.systemScheduler()).recordStats().build(tabletMetadataLoader); + + // Get the cache refresh percentage property + // Value must be less than 100% as 100 or over would effectively disable it + double cacheRefreshPercentage = + getConfiguration().getFraction(Property.SSERV_CACHED_TABLET_METADATA_REFRESH_PERCENT); + Preconditions.checkArgument(cacheRefreshPercentage < cacheExpiration, + "Tablet metadata cache refresh percentage is '%s' but must be less than 1", + cacheRefreshPercentage); + + var builder = Caffeine.newBuilder().expireAfterWrite(cacheExpiration, TimeUnit.MILLISECONDS) + .scheduler(Scheduler.systemScheduler()).executor(context.getScheduledExecutor()) Review Comment: A dedicated thread pool for the cache would be nice. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] speed up merge operation for lots of tablets [accumulo]
keith-turner commented on PR #4574: URL: https://github.com/apache/accumulo/pull/4574#issuecomment-2130056612 A lot of the changes that were in this PR are now in the commits 219d4f224e1d97d2bd7cd5dec5ec5968e89facef and ef213b520b2485f356cec7fd65f3ff4b66f43840 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Add service status as command option to admin command [accumulo]
EdColeman merged PR #4567: URL: https://github.com/apache/accumulo/pull/4567 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] improves fate execution of in progress transactions [accumulo]
keith-turner merged PR #4589: URL: https://github.com/apache/accumulo/pull/4589 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Log when a tablet has been in the migrating state for an extended period [accumulo]
dlmarion closed issue #4539: Log when a tablet has been in the migrating state for an extended period URL: https://github.com/apache/accumulo/issues/4539 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Log message when Tablet has been unloading for over 15 minutes [accumulo]
dlmarion merged PR #4558: URL: https://github.com/apache/accumulo/pull/4558 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] adds iterator for filtering on fate key type [accumulo]
keith-turner merged PR #4590: URL: https://github.com/apache/accumulo/pull/4590 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Add IT to verify Scan Servers remove their references on shutdown [accumulo]
dlmarion commented on issue #4594: URL: https://github.com/apache/accumulo/issues/4594#issuecomment-2129945412 An IT already exists for this, see https://github.com/apache/accumulo/blob/2.1/test/src/main/java/org/apache/accumulo/test/ScanServerMetadataEntriesIT.java#L140 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Add metrics for compactor entries read and write [accumulo]
DomGarguilo closed pull request #4572: Add metrics for compactor entries read and write URL: https://github.com/apache/accumulo/pull/4572 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] replaces sorted map w/ list for key vals in tablet metadata [accumulo]
keith-turner merged PR #4600: URL: https://github.com/apache/accumulo/pull/4600 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Add new scan metrics to flaky list in MetricsIT [accumulo]
EdColeman merged PR #4601: URL: https://github.com/apache/accumulo/pull/4601 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Add new scan metrics to flaky list in MetricsIT [accumulo]
EdColeman opened a new pull request, #4601: URL: https://github.com/apache/accumulo/pull/4601 - METRICS_SCAN_RESERVATION_WRITEOUT_TIMER - METRICS_SCAN_RESERVATION_CONFLICT_COUNTER -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] accumulo-cluster does not start a CompactionCoordinator on different host [accumulo]
dlmarion commented on issue #4597: URL: https://github.com/apache/accumulo/issues/4597#issuecomment-2129819681 I was unable to replicate this. I modified `accumulo-cluster` locally to emit debug output and so that it would always take the `ssh` path. I set the hosts in cluster.yaml to the public IP address of my dev box, but set my manager address to `localhost`. I saw the ssh command come across and compaction-coordinator logs appeared in the logs directory (Accumulo wouldn't start because I had ZK and Hadoop down). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Add ability to filter out metrics [accumulo]
keith-turner commented on issue #4599: URL: https://github.com/apache/accumulo/issues/4599#issuecomment-2129582096 > We don't necessarily have to implement something to provide some base functionality. A user can extend one of our existing implementations or create their own to provide this functionality. Wondering if it would be useful to provide some kind of help in the SPI for building micrometer filters related to Accumulo metrics. One possible way to do this would be to have constants related to metrics in the SPI. This could enable writing a MeterRegistryFactory that manipulates Accumulo metrics using those constants. Could have something like the following. ```java package org.apache.accumulo.core.spi.metrics; class MetricsConstants { static class Tags { String INSTANCE_TAG = "instance.name"; } static class Meters { String COMPACTIONS_QUEUED = "..."; } } ``` With the above then could have a scenario like the following. * Custom MeterRegistryFactory is written against 3.1 that drops the instance name tag. Its impl uses the `INSTANCE_TAG` constant from the SPI. * Accumulo 4.0 makes a major changes to the instance name tag that user need to consider. This change deletes INSTANCE_TAG constant from the SPI. * The custom MeterRegistryFactory written against 3.1 no longer compiles against 4.0 forcing an investigation of the change. Completely unrelated to metrics, but if this approach of putting metrics related constants in the SPI works then it could also potentially work for Accumulo's properties. Could have constants related to those in the SPI maybe. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Log message when Tablet has been unloading for over 15 minutes [accumulo]
dlmarion commented on code in PR #4558: URL: https://github.com/apache/accumulo/pull/4558#discussion_r1613492443 ## server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java: ## @@ -164,9 +167,10 @@ public long getDataSourceDeletions() { } private enum CloseState { -OPEN, CLOSING, CLOSED, COMPLETE +OPEN, REQUESTED, CLOSING, CLOSED, COMPLETE } + private volatile long closeRequestTime = 0; Review Comment: Removed volatile modifier in 3fa4f87 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Log message when Tablet has been unloading for over 15 minutes [accumulo]
dlmarion commented on code in PR #4558: URL: https://github.com/apache/accumulo/pull/4558#discussion_r1613492183 ## server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java: ## @@ -905,6 +909,19 @@ public void close(boolean saveState) throws IOException { void initiateClose(boolean saveState) { log.trace("initiateClose(saveState={}) {}", saveState, getExtent()); +synchronized (this) { + if (closeState == CloseState.OPEN) { +closeRequestTime = System.nanoTime(); + } else if (closeRequestTime != 0) { +long runningTime = Duration.ofNanos(System.nanoTime() - closeRequestTime).toMinutes(); +if (runningTime >= 15) { + DEDUPE_LOGGER.info("Tablet {} close requested again, but has been closing for {} minutes", + this.extent, runningTime); +} + } + closeState = CloseState.REQUESTED; Review Comment: Change made in 3fa4f87 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Log message when Tablet has been unloading for over 15 minutes [accumulo]
dlmarion commented on code in PR #4558: URL: https://github.com/apache/accumulo/pull/4558#discussion_r1613491599 ## core/src/main/java/org/apache/accumulo/core/logging/ConditionalLogger.java: ## @@ -0,0 +1,193 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * https://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.accumulo.core.logging; + +import java.time.Duration; +import java.util.Arrays; +import java.util.List; +import java.util.concurrent.ConcurrentMap; +import java.util.function.BiFunction; + +import org.apache.accumulo.core.util.Pair; +import org.slf4j.Logger; +import org.slf4j.Marker; +import org.slf4j.event.Level; +import org.slf4j.helpers.AbstractLogger; + +import com.github.benmanes.caffeine.cache.Cache; +import com.github.benmanes.caffeine.cache.Caffeine; + +/** + * Logger that wraps another Logger and only emits a log message once per the supplied duration. + * + */ +public abstract class ConditionalLogger extends AbstractLogger { + + private static final long serialVersionUID = 1L; + + /** + * A Logger implementation that will log a message at the supplied elevated level if it has not + * been seen in the supplied duration. For repeat occurrences the message will be logged at the + * level used in code (which is likely a lower level). Note that the first log message will be + * logged at the elevated level because it has not been seen before. + */ + public static class EscalatingLogger extends DeduplicatingLogger { + +private static final long serialVersionUID = 1L; +private final Level elevatedLevel; + +public EscalatingLogger(Logger log, Duration threshold, Level elevatedLevel) { + super(log, threshold); + this.elevatedLevel = elevatedLevel; +} + +@Override +protected void handleNormalizedLoggingCall(Level level, Marker marker, String messagePattern, +Object[] arguments, Throwable throwable) { + + if (arguments == null) { +arguments = new Object[0]; + } + if (!condition.apply(messagePattern, Arrays.asList(arguments))) { + delegate.atLevel(level).addMarker(marker).setCause(throwable).log(messagePattern, +arguments); + } else { + delegate.atLevel(elevatedLevel).addMarker(marker).setCause(throwable).log(messagePattern, +arguments); + } + +} + + } + + /** + * A Logger implementation that will suppress duplicate messages within the supplied duration. + */ + public static class DeduplicatingLogger extends ConditionalLogger { + +private static final long serialVersionUID = 1L; + +public DeduplicatingLogger(Logger log, Duration threshold) { + super(log, new BiFunction<>() { + +private final Cache>,Boolean> cache = + Caffeine.newBuilder().expireAfterWrite(threshold).maximumSize(250).build(); Review Comment: I made the cache size configurable in 3fa4f87, and set it to 1000 in the stuck logger in Tablet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Log message when Tablet has been unloading for over 15 minutes [accumulo]
dlmarion commented on code in PR #4558: URL: https://github.com/apache/accumulo/pull/4558#discussion_r1613490374 ## server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java: ## @@ -905,6 +909,19 @@ public void close(boolean saveState) throws IOException { void initiateClose(boolean saveState) { log.trace("initiateClose(saveState={}) {}", saveState, getExtent()); +synchronized (this) { + if (closeState == CloseState.OPEN) { +closeRequestTime = System.nanoTime(); + } else if (closeRequestTime != 0) { Review Comment: Implemented this change in 3fa4f87 ## server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java: ## @@ -141,6 +143,7 @@ */ public class Tablet extends TabletBase { private static final Logger log = LoggerFactory.getLogger(Tablet.class); + private static final Logger DEDUPE_LOGGER = new DeduplicatingLogger(log, Duration.ofMinutes(5)); Review Comment: Renamed logger in 3fa4f87 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] handles failure in compaction planner [accumulo]
dlmarion commented on code in PR #4586: URL: https://github.com/apache/accumulo/pull/4586#discussion_r1613446675 ## server/base/src/main/java/org/apache/accumulo/server/compaction/CompactionJobGenerator.java: ## @@ -162,23 +166,29 @@ public Map getExecutionHints() { return dispatcher.dispatch(dispatchParams).getService(); } + private Level getLevel(TableId tableId, CompactionServiceId serviceId, Review Comment: Curious if you can use the EscalatingLogger ( or some other variant of ConditionalLogger) that's included in #4558 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] improves fate execution of in progress transactions [accumulo]
dlmarion commented on code in PR #4589: URL: https://github.com/apache/accumulo/pull/4589#discussion_r1613395827 ## core/src/main/java/org/apache/accumulo/core/fate/MetaFateStore.java: ## @@ -356,9 +358,9 @@ protected FateInstanceType getInstanceType() { } @Override - protected Stream getTransactions() { + protected Stream getTransactions(Set statuses) { Review Comment: If the changes in this PR are for performance, I wonder if we should turn this stream back into a for loop. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Modified FATE to allow transactions to run to completion without interruption [accumulo]
dlmarion closed pull request #3852: Modified FATE to allow transactions to run to completion without interruption URL: https://github.com/apache/accumulo/pull/3852 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Modified FATE to allow transactions to run to completion without interruption [accumulo]
dlmarion commented on PR #3852: URL: https://github.com/apache/accumulo/pull/3852#issuecomment-2129358394 Superceded by #4589 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Add ability to filter out metrics [accumulo]
dlmarion commented on issue #4599: URL: https://github.com/apache/accumulo/issues/4599#issuecomment-2129336830 So, that's a fair point. We don't necessarily have to implement something to provide some base functionality. A user can extend one of our existing implementations or create their own to provide this functionality. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fixed NPE in ScanServerMetrics [accumulo]
dlmarion merged PR #4598: URL: https://github.com/apache/accumulo/pull/4598 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org