From a quick look at the code, it looks like that scan will happen whenever the configuration for the repository is changed. Is that what happened for you?
Not sure if that was intentional or not. - Brett On 23 Jul 2014, at 7:13 am, Stallard,David <[email protected]> wrote: > We have roughly 1.6 terabytes of data in our largest Archiva instance it it > grows rapidly. Because of this amount of data, and/or perhaps because of > limitations of our current hardware (which we are working to improve), doing > a full directory scan degrades performance of Archiva as a whole and it can > take quite a long time to complete...48 hours or more. > > Because of that, we don't do directory scans unless we feel it's necessary to > fix some unusual situation. The index scans are usually sufficient. > > Today, a directory scan of the internal repository mysteriously started up. > Although the System Status page doesn't say what type of scan is running, I > believe it's a directory scan because the Files Processed number is equal to > the New Files number. This has bogged down the system as expected and we're > getting complaints from users about uploads and downloads taking a long time. > > Looking in the log to try and find how this scan was started, I found the > following line: > > 2014-07-22 11:09:26,770 [pool-5-thread-1] INFO > org.apache.archiva.scheduler.repository.ArchivaRepositoryScanningTaskExecutor > [] - Executing task from queue with job name: RepositoryTask > [repositoryId=internal, resourceFile=null, scanAll=true, > updateRelatedArtifacts=false] > > This seems to indicate that either the scheduler kicked it off, or at some > point in the past a directory scan was added to the queue and it is just now > being processed. I don't know if the latter is even possible or not...I > thought that the stuff in the queue was individual artifacts that had been > marked by scans for later processing. > > Our Cron Expression for the internal repository is the following, which > should not have kicked off a scan at the time shown above. However, even if > it did, I believe that the Cron Expression usually kicks off index scans > rather than directory scans? > > 0 0 19 * * ? > > So, two questions: > > > 1. Any idea why this directory scan might have been started? > 2. Is there any way to stop a scan after it has started? I'm assuming a > bounce of Archiva would stop it, but an option that didn't incur downtime > would be preferable. > > Thanks, > David
