Brett, we did do some tweaking to the cron schedules for both snapshots and internal yesterday, that¹s probably what initiated the scan. And I¹m guessing a directory scan of snapshots is sitting in the queue waiting for the internal scan to finish. We will probably bounce Archiva to stop these scans and clear the queues. Is there any harmful side effect to bouncing during a scan? I think we¹ve done it before without impact. As an enhancement, an admin button to abort an in-progress scan would be useful.
Thanks, David On 7/23/14, 12:59 AM, "Brett Porter" <[email protected]> wrote: >From a quick look at the code, it looks like that scan will happen >whenever the configuration for the repository is changed. Is that what >happened for you? > >Not sure if that was intentional or not. > >- Brett > >On 23 Jul 2014, at 7:13 am, Stallard,David <[email protected]> wrote: > >> We have roughly 1.6 terabytes of data in our largest Archiva instance >>it it grows rapidly. Because of this amount of data, and/or perhaps >>because of limitations of our current hardware (which we are working to >>improve), doing a full directory scan degrades performance of Archiva as >>a whole and it can take quite a long time to complete...48 hours or more. >> >> Because of that, we don't do directory scans unless we feel it's >>necessary to fix some unusual situation. The index scans are usually >>sufficient. >> >> Today, a directory scan of the internal repository mysteriously started >>up. Although the System Status page doesn't say what type of scan is >>running, I believe it's a directory scan because the Files Processed >>number is equal to the New Files number. This has bogged down the >>system as expected and we're getting complaints from users about uploads >>and downloads taking a long time. >> >> Looking in the log to try and find how this scan was started, I found >>the following line: >> >> 2014-07-22 11:09:26,770 [pool-5-thread-1] INFO >>org.apache.archiva.scheduler.repository.ArchivaRepositoryScanningTaskExec >>utor [] - Executing task from queue with job name: RepositoryTask >>[repositoryId=internal, resourceFile=null, scanAll=true, >>updateRelatedArtifacts=false] >> >> This seems to indicate that either the scheduler kicked it off, or at >>some point in the past a directory scan was added to the queue and it is >>just now being processed. I don't know if the latter is even possible >>or not...I thought that the stuff in the queue was individual artifacts >>that had been marked by scans for later processing. >> >> Our Cron Expression for the internal repository is the following, which >>should not have kicked off a scan at the time shown above. However, >>even if it did, I believe that the Cron Expression usually kicks off >>index scans rather than directory scans? >> >> 0 0 19 * * ? >> >> So, two questions: >> >> >> 1. Any idea why this directory scan might have been started? >> 2. Is there any way to stop a scan after it has started? I'm >>assuming a bounce of Archiva would stop it, but an option that didn't >>incur downtime would be preferable. >> >> Thanks, >> David >
