Re: [Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1
Dear Stephen Thompson, In message 50982f94.10...@seismo.berkeley.edu you wrote: On 11/05/2012 01:17 PM, Josh Fisher wrote: On 11/5/2012 11:03 AM, Stephen Thompson wrote: ... When you start the jobs manually, I assume you are starting them at different times. This works, because the first job is up and running with the volume loaded before the second job begins its selection process. One way to handle this issue is to have a different Schedule for each job and start the jobs at different times with one second spacing. Jobs will still run concurrently, they just won't start up concurrently. I suspected something like that, but would ask out loud if bacula runs into a contention like that and there are other available volumes in the requested pool, why doesn't it decide to use another volume instead of blocking? I'm not sure about this. I see very similar problems when trying to use the second drive for other purposes, like labelling new tapes, while a job is running on the first one, like this: *label dummy pool=ARCH storage=LTOLIB drive=1 slots=25 barcodes Connecting to Storage daemon LTOLIB at ltos.denx.de:9103 ... 3306 Issuing autochanger slots command. Device LTO3-0 has 48 slots. Connecting to Storage daemon LTOLIB at ltos.denx.de:9103 ... 3306 Issuing autochanger list command. The following Volumes will be labeled: Slot Volume == 25 SAV000L3 Do you want to label these Volumes? (yes|no): yes Connecting to Storage daemon LTOLIB at ltos.denx.de:9103 ... Sending label command for Volume SAV000L3 Slot 25 ... 3937 Device LTO3-0 (/dev/tape/by-id/scsi-35000e11802947001-nst) is busy with writers=1 reserved=0. Label command failed for Volume SAV000L3. with: Device { Name = LTO3-0 Media Type = LTO-3 Archive Device = /dev/tape/by-id/scsi-35000e11802947001-nst AutomaticMount = yes; # when device opened, read it AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; Maximum File Size = 5GB Maximum Block Size = 512K Changer Command = /usr/libexec/bacula/mtx-changer %c %o %S %a %d Changer Device = /dev/tape/by-id/scsi-1BDT_FlexStor_II_00DE64100465_LL0 AutoChanger = yes # Enable the Alert command only if you have the mtx package loaded Alert Command = sh -c 'tapeinfo -f %c |grep TapeAlert|cat' # If you have smartctl, enable this, it has more info than tapeinfo # Alert Command = sh -c 'smartctl -H -l error %c' # Spool Data to disk before writing to tape Spool Directory = /backup/spool Maximum Spool Size = 5120GB Maximum Job Spool Size = 5120GB } Device { Name = LTO3-1 Media Type = LTO-3 Archive Device = /dev/tape/by-id/scsi-35000e11802947004-nst AutomaticMount = yes; # when device opened, read it AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; Maximum File Size = 5GB Maximum Block Size = 512K Changer Command = /usr/libexec/bacula/mtx-changer %c %o %S %a %d Changer Device = /dev/tape/by-id/scsi-1BDT_FlexStor_II_00DE64100465_LL0 AutoChanger = yes # Enable the Alert command only if you have the mtx package loaded # Alert Command = sh -c 'tapeinfo -f %c |grep TapeAlert|cat' # If you have smartctl, enable this, it has more info than tapeinfo Alert Command = sh -c 'smartctl -H -l error %c' # Spool Data to disk before writing to tape Spool Directory = /backup/spool Maximum Spool Size = 5120GB Maximum Job Spool Size = 5120GB } Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: w...@denx.de Drawing on my fine command of language, I said nothing. -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_nov ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1
I've had the following problem for ages (meaning multiple major revisions of bacula) and I've seen this come up from time to time on the mailing list, but I've never actually seen a resolution (please point me to one if it's been found). background: I run monthly Fulls and nightly Incrementals. I have a 2 drive autochanger dedicated to my Incrementals. I launch something like ~150 Incremental jobs each night. I am configured for 8 concurrent jobs on the Storage Daemon. PROBLEM: The first job(s) grab one of the 2 devices available in the changer (which is set to AutoSelect) and either load a tape, or use a tape from the previous evening. All tapes in the changer are in the same Incremenal-Pool. The second jobs(s) grab the other of the 2 devices available in the changer, but want to use the same tape that's just been mounted (or put into use) on the jobs that got launched first. They will often literal wait the entire evening until 100's of jobs run through on only one device, until that tape is freed up, at which point it is unmounted from the first device and moved to the second. Note, the behaviour seems to be to round-robin my 8 concurrency limit between the 2 available drives, which mean 4 jobs will run, and 4 jobs will block on waiting for the wanted Volume. When the original 4 jobs are completed (not at the same time) additional jobs are launched that keep that wanted Volume in use. LOG: 03-Nov 22:00 DIRECTOR JobId 267433: Start Backup JobId 267433, Job=JOB. 2012-11-03_22.00.00_0403-Nov 22:00 DIRECTOR JobId 267433: Using Device L100-Drive-003-Nov 22:00 DIRECTOR JobId 267433: Sending Accurate information. 03-Nov 22:00 sd_L100_ JobId 267433: 3307 Issuing autochanger unload slot 82, drive 0 command. 03-Nov 22:06 lawson-sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: mount.c:217 Open device L100-Drive-0 (/dev/L100-Drive-0) Volume IM0108 failed: ERR=dev.c:513 Unable to open device L100-Drive-0 (/dev/L100-Drive-0): ERR=No medium found . . . CONFIGS (partial and seem pretty straight-forward): Schedule { Name = DefaultSchedule Run = Level=Incremental sat-thu at 22:00 Run = Level=Differential fri at 22:00 } JobDefs { Name = DefaultJob Type = Backup Level = Full Schedule = DefaultSchedule Incremental Backup Pool = Incremental-Pool Differential Backup Pool = Incremental-Pool } Pool { Name = Incremental-Pool Pool Type = Backup Storage = L100-changer } Storage { Name = L100-changer Device = L100-changer Media Type = LTO-3 Autochanger = yes Maximum Concurrent Jobs = 8 } Autochanger { Name = L100-changer Device = L100-Drive-0 Device = L100-Drive-1 Changer Device = /dev/L100-changer } Device { Name = L100-Drive-0 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-0 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } Device { Name = L100-Drive-1 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-1 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } I do not have a good solution but I know by default bacula does not want to load the same pool into more than 1 storage device at the same time. John -- LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1
On 11/5/12 7:59 AM, John Drescher wrote: I've had the following problem for ages (meaning multiple major revisions of bacula) and I've seen this come up from time to time on the mailing list, but I've never actually seen a resolution (please point me to one if it's been found). background: I run monthly Fulls and nightly Incrementals. I have a 2 drive autochanger dedicated to my Incrementals. I launch something like ~150 Incremental jobs each night. I am configured for 8 concurrent jobs on the Storage Daemon. PROBLEM: The first job(s) grab one of the 2 devices available in the changer (which is set to AutoSelect) and either load a tape, or use a tape from the previous evening. All tapes in the changer are in the same Incremenal-Pool. The second jobs(s) grab the other of the 2 devices available in the changer, but want to use the same tape that's just been mounted (or put into use) on the jobs that got launched first. They will often literal wait the entire evening until 100's of jobs run through on only one device, until that tape is freed up, at which point it is unmounted from the first device and moved to the second. Note, the behaviour seems to be to round-robin my 8 concurrency limit between the 2 available drives, which mean 4 jobs will run, and 4 jobs will block on waiting for the wanted Volume. When the original 4 jobs are completed (not at the same time) additional jobs are launched that keep that wanted Volume in use. LOG: 03-Nov 22:00 DIRECTOR JobId 267433: Start Backup JobId 267433, Job=JOB. 2012-11-03_22.00.00_0403-Nov 22:00 DIRECTOR JobId 267433: Using Device L100-Drive-003-Nov 22:00 DIRECTOR JobId 267433: Sending Accurate information. 03-Nov 22:00 sd_L100_ JobId 267433: 3307 Issuing autochanger unload slot 82, drive 0 command. 03-Nov 22:06 lawson-sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: mount.c:217 Open device L100-Drive-0 (/dev/L100-Drive-0) Volume IM0108 failed: ERR=dev.c:513 Unable to open device L100-Drive-0 (/dev/L100-Drive-0): ERR=No medium found . . . CONFIGS (partial and seem pretty straight-forward): Schedule { Name = DefaultSchedule Run = Level=Incremental sat-thu at 22:00 Run = Level=Differential fri at 22:00 } JobDefs { Name = DefaultJob Type = Backup Level = Full Schedule = DefaultSchedule Incremental Backup Pool = Incremental-Pool Differential Backup Pool = Incremental-Pool } Pool { Name = Incremental-Pool Pool Type = Backup Storage = L100-changer } Storage { Name = L100-changer Device = L100-changer Media Type = LTO-3 Autochanger = yes Maximum Concurrent Jobs = 8 } Autochanger { Name = L100-changer Device = L100-Drive-0 Device = L100-Drive-1 Changer Device = /dev/L100-changer } Device { Name = L100-Drive-0 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-0 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } Device { Name = L100-Drive-1 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-1 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } I do not have a good solution but I know by default bacula does not want to load the same pool into more than 1 storage device at the same time. John I think it's something in the automated logic. Because if I launch jobs by hand (same pool across 2 tapes devices in same autochanger) everything works fine. I think it has more to do with the Scheduler assigning the same same Volume to all jobs and then not wanting to change that choice if that Volume is in use. If I do a status on the Director for instance and see the jobs for the next day lined up in Scheduled jobs, they all have the same Volume listed. thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central
Re: [Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1
On 11/05/12 08:03, Stephen Thompson wrote: On 11/5/12 7:59 AM, John Drescher wrote: I've had the following problem for ages (meaning multiple major revisions of bacula) and I've seen this come up from time to time on the mailing list, but I've never actually seen a resolution (please point me to one if it's been found). background: I run monthly Fulls and nightly Incrementals. I have a 2 drive autochanger dedicated to my Incrementals. I launch something like ~150 Incremental jobs each night. I am configured for 8 concurrent jobs on the Storage Daemon. PROBLEM: The first job(s) grab one of the 2 devices available in the changer (which is set to AutoSelect) and either load a tape, or use a tape from the previous evening. All tapes in the changer are in the same Incremenal-Pool. The second jobs(s) grab the other of the 2 devices available in the changer, but want to use the same tape that's just been mounted (or put into use) on the jobs that got launched first. They will often literal wait the entire evening until 100's of jobs run through on only one device, until that tape is freed up, at which point it is unmounted from the first device and moved to the second. Note, the behaviour seems to be to round-robin my 8 concurrency limit between the 2 available drives, which mean 4 jobs will run, and 4 jobs will block on waiting for the wanted Volume. When the original 4 jobs are completed (not at the same time) additional jobs are launched that keep that wanted Volume in use. LOG: 03-Nov 22:00 DIRECTOR JobId 267433: Start Backup JobId 267433, Job=JOB. 2012-11-03_22.00.00_0403-Nov 22:00 DIRECTOR JobId 267433: Using Device L100-Drive-003-Nov 22:00 DIRECTOR JobId 267433: Sending Accurate information. 03-Nov 22:00 sd_L100_ JobId 267433: 3307 Issuing autochanger unload slot 82, drive 0 command. 03-Nov 22:06 lawson-sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: mount.c:217 Open device L100-Drive-0 (/dev/L100-Drive-0) Volume IM0108 failed: ERR=dev.c:513 Unable to open device L100-Drive-0 (/dev/L100-Drive-0): ERR=No medium found . . . CONFIGS (partial and seem pretty straight-forward): Schedule { Name = DefaultSchedule Run = Level=Incremental sat-thu at 22:00 Run = Level=Differential fri at 22:00 } JobDefs { Name = DefaultJob Type = Backup Level = Full Schedule = DefaultSchedule Incremental Backup Pool = Incremental-Pool Differential Backup Pool = Incremental-Pool } Pool { Name = Incremental-Pool Pool Type = Backup Storage = L100-changer } Storage { Name = L100-changer Device = L100-changer Media Type = LTO-3 Autochanger = yes Maximum Concurrent Jobs = 8 } Autochanger { Name = L100-changer Device = L100-Drive-0 Device = L100-Drive-1 Changer Device = /dev/L100-changer } Device { Name = L100-Drive-0 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-0 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } Device { Name = L100-Drive-1 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-1 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } I do not have a good solution but I know by default bacula does not want to load the same pool into more than 1 storage device at the same time. John I think it's something in the automated logic. Because if I launch jobs by hand (same pool across 2 tapes devices in same autochanger) everything works fine. I think it has more to do with the Scheduler assigning the same same Volume to all jobs and then not wanting to change that choice if that Volume is in use. I also use Accurate backups which can sometimes take a bit before the job get's back to volume/drive assignments, so it might be a race condition where when the blocking jobs start they still want the same Volume as the jobs that run, because the jobs that run are still setting up Accurate backup and haven't been solidly assigned that Volume yet. I don't know. It's rather annoying, especially as we attempt to ramp up our backup capacity. Lastly, it doesn't ALWAYS happen, though it does seem to happen more often than not. If I do a status on the Director for instance and see the jobs for the next day lined up in Scheduled jobs, they all have the same Volume
Re: [Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1
On 11/5/2012 11:03 AM, Stephen Thompson wrote: On 11/5/12 7:59 AM, John Drescher wrote: I've had the following problem for ages (meaning multiple major revisions of bacula) and I've seen this come up from time to time on the mailing list, but I've never actually seen a resolution (please point me to one if it's been found). background: I run monthly Fulls and nightly Incrementals. I have a 2 drive autochanger dedicated to my Incrementals. I launch something like ~150 Incremental jobs each night. I am configured for 8 concurrent jobs on the Storage Daemon. PROBLEM: The first job(s) grab one of the 2 devices available in the changer (which is set to AutoSelect) and either load a tape, or use a tape from the previous evening. All tapes in the changer are in the same Incremenal-Pool. The second jobs(s) grab the other of the 2 devices available in the changer, but want to use the same tape that's just been mounted (or put into use) on the jobs that got launched first. They will often literal wait the entire evening until 100's of jobs run through on only one device, until that tape is freed up, at which point it is unmounted from the first device and moved to the second. Note, the behaviour seems to be to round-robin my 8 concurrency limit between the 2 available drives, which mean 4 jobs will run, and 4 jobs will block on waiting for the wanted Volume. When the original 4 jobs are completed (not at the same time) additional jobs are launched that keep that wanted Volume in use. LOG: 03-Nov 22:00 DIRECTOR JobId 267433: Start Backup JobId 267433, Job=JOB. 2012-11-03_22.00.00_0403-Nov 22:00 DIRECTOR JobId 267433: Using Device L100-Drive-003-Nov 22:00 DIRECTOR JobId 267433: Sending Accurate information. 03-Nov 22:00 sd_L100_ JobId 267433: 3307 Issuing autochanger unload slot 82, drive 0 command. 03-Nov 22:06 lawson-sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: mount.c:217 Open device L100-Drive-0 (/dev/L100-Drive-0) Volume IM0108 failed: ERR=dev.c:513 Unable to open device L100-Drive-0 (/dev/L100-Drive-0): ERR=No medium found . . . CONFIGS (partial and seem pretty straight-forward): Schedule { Name = DefaultSchedule Run = Level=Incremental sat-thu at 22:00 Run = Level=Differential fri at 22:00 } JobDefs { Name = DefaultJob Type = Backup Level = Full Schedule = DefaultSchedule Incremental Backup Pool = Incremental-Pool Differential Backup Pool = Incremental-Pool } Pool { Name = Incremental-Pool Pool Type = Backup Storage = L100-changer } Storage { Name = L100-changer Device = L100-changer Media Type = LTO-3 Autochanger = yes Maximum Concurrent Jobs = 8 } Autochanger { Name = L100-changer Device = L100-Drive-0 Device = L100-Drive-1 Changer Device = /dev/L100-changer } Device { Name = L100-Drive-0 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-0 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } Device { Name = L100-Drive-1 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-1 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } I do not have a good solution but I know by default bacula does not want to load the same pool into more than 1 storage device at the same time. John I think it's something in the automated logic. Because if I launch jobs by hand (same pool across 2 tapes devices in same autochanger) everything works fine. I think it has more to do with the Scheduler assigning the same same Volume to all jobs and then not wanting to change that choice if that Volume is in use. When both jobs start at the same time and same priority, they see the same exact next available volume for the pool, and so both select the same volume. When they select different drives, it is a problem, since the volume can only be in one drive. When you start the jobs manually, I assume you are starting them at different times. This works, because the first job is up and running with the volume loaded before the second job begins its selection process. One way to handle this issue is to have a different Schedule for each job and start the jobs at different times with one second spacing. Jobs will still run concurrently, they just won't start up concurrently.
Re: [Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1
On 11/05/2012 01:17 PM, Josh Fisher wrote: On 11/5/2012 11:03 AM, Stephen Thompson wrote: On 11/5/12 7:59 AM, John Drescher wrote: I've had the following problem for ages (meaning multiple major revisions of bacula) and I've seen this come up from time to time on the mailing list, but I've never actually seen a resolution (please point me to one if it's been found). background: I run monthly Fulls and nightly Incrementals. I have a 2 drive autochanger dedicated to my Incrementals. I launch something like ~150 Incremental jobs each night. I am configured for 8 concurrent jobs on the Storage Daemon. PROBLEM: The first job(s) grab one of the 2 devices available in the changer (which is set to AutoSelect) and either load a tape, or use a tape from the previous evening. All tapes in the changer are in the same Incremenal-Pool. The second jobs(s) grab the other of the 2 devices available in the changer, but want to use the same tape that's just been mounted (or put into use) on the jobs that got launched first. They will often literal wait the entire evening until 100's of jobs run through on only one device, until that tape is freed up, at which point it is unmounted from the first device and moved to the second. Note, the behaviour seems to be to round-robin my 8 concurrency limit between the 2 available drives, which mean 4 jobs will run, and 4 jobs will block on waiting for the wanted Volume. When the original 4 jobs are completed (not at the same time) additional jobs are launched that keep that wanted Volume in use. LOG: 03-Nov 22:00 DIRECTOR JobId 267433: Start Backup JobId 267433, Job=JOB. 2012-11-03_22.00.00_0403-Nov 22:00 DIRECTOR JobId 267433: Using Device L100-Drive-003-Nov 22:00 DIRECTOR JobId 267433: Sending Accurate information. 03-Nov 22:00 sd_L100_ JobId 267433: 3307 Issuing autochanger unload slot 82, drive 0 command. 03-Nov 22:06 lawson-sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: mount.c:217 Open device L100-Drive-0 (/dev/L100-Drive-0) Volume IM0108 failed: ERR=dev.c:513 Unable to open device L100-Drive-0 (/dev/L100-Drive-0): ERR=No medium found . . . CONFIGS (partial and seem pretty straight-forward): Schedule { Name = DefaultSchedule Run = Level=Incremental sat-thu at 22:00 Run = Level=Differential fri at 22:00 } JobDefs { Name = DefaultJob Type = Backup Level = Full Schedule = DefaultSchedule Incremental Backup Pool = Incremental-Pool Differential Backup Pool = Incremental-Pool } Pool { Name = Incremental-Pool Pool Type = Backup Storage = L100-changer } Storage { Name = L100-changer Device = L100-changer Media Type = LTO-3 Autochanger = yes Maximum Concurrent Jobs = 8 } Autochanger { Name = L100-changer Device = L100-Drive-0 Device = L100-Drive-1 Changer Device = /dev/L100-changer } Device { Name = L100-Drive-0 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-0 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } Device { Name = L100-Drive-1 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-1 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } I do not have a good solution but I know by default bacula does not want to load the same pool into more than 1 storage device at the same time. John I think it's something in the automated logic. Because if I launch jobs by hand (same pool across 2 tapes devices in same autochanger) everything works fine. I think it has more to do with the Scheduler assigning the same same Volume to all jobs and then not wanting to change that choice if that Volume is in use. When both jobs start at the same time and same priority, they see the same exact next available volume for the pool, and so both select the same volume. When they select different drives, it is a problem, since the volume can only be in one drive. When you start the jobs manually, I assume you are starting them at different times. This works, because the first job is up and running with the volume loaded before the second job begins its selection process. One way to handle this issue is to have a different Schedule for each job and start the jobs at different times with one
Re: [Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1
On 11/5/2012 4:28 PM, Stephen Thompson wrote: On 11/05/2012 01:17 PM, Josh Fisher wrote: On 11/5/2012 11:03 AM, Stephen Thompson wrote: On 11/5/12 7:59 AM, John Drescher wrote: I've had the following problem for ages (meaning multiple major revisions of bacula) and I've seen this come up from time to time on the mailing list, but I've never actually seen a resolution (please point me to one if it's been found). background: I run monthly Fulls and nightly Incrementals. I have a 2 drive autochanger dedicated to my Incrementals. I launch something like ~150 Incremental jobs each night. I am configured for 8 concurrent jobs on the Storage Daemon. PROBLEM: The first job(s) grab one of the 2 devices available in the changer (which is set to AutoSelect) and either load a tape, or use a tape from the previous evening. All tapes in the changer are in the same Incremenal-Pool. The second jobs(s) grab the other of the 2 devices available in the changer, but want to use the same tape that's just been mounted (or put into use) on the jobs that got launched first. They will often literal wait the entire evening until 100's of jobs run through on only one device, until that tape is freed up, at which point it is unmounted from the first device and moved to the second. Note, the behaviour seems to be to round-robin my 8 concurrency limit between the 2 available drives, which mean 4 jobs will run, and 4 jobs will block on waiting for the wanted Volume. When the original 4 jobs are completed (not at the same time) additional jobs are launched that keep that wanted Volume in use. LOG: 03-Nov 22:00 DIRECTOR JobId 267433: Start Backup JobId 267433, Job=JOB. 2012-11-03_22.00.00_0403-Nov 22:00 DIRECTOR JobId 267433: Using Device L100-Drive-003-Nov 22:00 DIRECTOR JobId 267433: Sending Accurate information. 03-Nov 22:00 sd_L100_ JobId 267433: 3307 Issuing autochanger unload slot 82, drive 0 command. 03-Nov 22:06 lawson-sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: mount.c:217 Open device L100-Drive-0 (/dev/L100-Drive-0) Volume IM0108 failed: ERR=dev.c:513 Unable to open device L100-Drive-0 (/dev/L100-Drive-0): ERR=No medium found . . . CONFIGS (partial and seem pretty straight-forward): Schedule { Name = DefaultSchedule Run = Level=Incremental sat-thu at 22:00 Run = Level=Differential fri at 22:00 } JobDefs { Name = DefaultJob Type = Backup Level = Full Schedule = DefaultSchedule Incremental Backup Pool = Incremental-Pool Differential Backup Pool = Incremental-Pool } Pool { Name = Incremental-Pool Pool Type = Backup Storage = L100-changer } Storage { Name = L100-changer Device = L100-changer Media Type = LTO-3 Autochanger = yes Maximum Concurrent Jobs = 8 } Autochanger { Name = L100-changer Device = L100-Drive-0 Device = L100-Drive-1 Changer Device = /dev/L100-changer } Device { Name = L100-Drive-0 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-0 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } Device { Name = L100-Drive-1 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-1 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } I do not have a good solution but I know by default bacula does not want to load the same pool into more than 1 storage device at the same time. John I think it's something in the automated logic. Because if I launch jobs by hand (same pool across 2 tapes devices in same autochanger) everything works fine. I think it has more to do with the Scheduler assigning the same same Volume to all jobs and then not wanting to change that choice if that Volume is in use. When both jobs start at the same time and same priority, they see the same exact next available volume for the pool, and so both select the same volume. When they select different drives, it is a problem, since the volume can only be in one drive. When you start the jobs manually, I assume you are starting them at different times. This works, because the first job is up and running with the volume loaded before the second job begins its selection process. One way to handle this issue is to
Re: [Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1
In the message dated: Mon, 05 Nov 2012 13:28:52 PST, The pithy ruminations from Stephen Thompson on Re: [Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1 were: = On 11/05/2012 01:17 PM, Josh Fisher wrote: = = On 11/5/2012 11:03 AM, Stephen Thompson wrote: = = On 11/5/12 7:59 AM, John Drescher wrote: = I've had the following problem for ages (meaning multiple major = revisions of bacula) and I've seen this come up from time to time on the = mailing list, but I've never actually seen a resolution (please point me = to one if it's been found). If I understand your question correctly, I've seen the same behavior in our installation, and also haven't found an answer. [SNIP!] = = I do not have a good solution but I know by default bacula does not = want to load the same pool into more than 1 storage device at the same = time. = = John = = I think it's something in the automated logic. Because if I launch jobs = by hand (same pool across 2 tapes devices in same autochanger) = everything works fine. I think it has more to do with the Scheduler = assigning the same same Volume to all jobs and then not wanting to = change that choice if that Volume is in use. = = When both jobs start at the same time and same priority, they see the = same exact next available volume for the pool, and so both select the = same volume. When they select different drives, it is a problem, since = the volume can only be in one drive. That's the best explanation I've seen so far. = = When you start the jobs manually, I assume you are starting them at = different times. This works, because the first job is up and running = with the volume loaded before the second job begins its selection = process. One way to handle this issue is to have a different Schedule = for each job and start the jobs at different times with one second = spacing. Jobs will still run concurrently, they just won't start up = concurrently. I suggest that the proper way to handle this is within Bacula's scheduler, not by having people manually add random numbers to each scheduler entry to avoid contention...it doesn't scale on large installations, it makes scheduler maintenance much more difficult, and that's the sort of things that computers do well, and humans do poorly. While we often think of computers as multi-tasking and able to handle events simultaneously, this is a real-world example of contention for physical resources, and this problem could probably be resolved within the scheduler easily. For example, the logic to start jobs could add a delay to consecutive jobs with an identical start time, as in: foreach ( job at time X ) runjob pause 30 s done The arbitrary delay of 30 seconds (in this example) is of no consequence over the duration of a typical backup, but it has a great impact in allowing the hardware to operate serially, so that bacula can properly determine what media and drives are available when the next job with the same scheduled start time actually begins running. = = = I suspected something like that, but would ask out loud if bacula runs = into a contention like that and there are other available volumes in the = requested pool, why doesn't it decide to use another volume instead of = blocking? Absolutely. = = It's also disappointing, because we've already pulled virtually all of = our scheduling outside of bacula into scripts because the logic seldom = works out for us. This may be another case of that. I'm surprised this = isn't a more common concern. What could be more run-of-the-mill than My impression is that this is a common concern among bacula users. Perhaps it's been difficult to determine the real problem, and that's why there hasn't been a solution. Alternatively, there may be multiple conditions that trigger the same behavior. = having a nightly incremental pool within an autochanger with multiple = drives? = = thanks! = Stephen = = = = If I do a status on the Director for instance and see the jobs for the = next day lined up in Scheduled jobs, they all have the same Volume listed. = = thanks, = Stephen = = -- LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1
Going to try this out. Stephen On 11/05/2012 02:40 PM, Josh Fisher wrote: On 11/5/2012 4:28 PM, Stephen Thompson wrote: On 11/05/2012 01:17 PM, Josh Fisher wrote: On 11/5/2012 11:03 AM, Stephen Thompson wrote: On 11/5/12 7:59 AM, John Drescher wrote: I've had the following problem for ages (meaning multiple major revisions of bacula) and I've seen this come up from time to time on the mailing list, but I've never actually seen a resolution (please point me to one if it's been found). background: I run monthly Fulls and nightly Incrementals. I have a 2 drive autochanger dedicated to my Incrementals. I launch something like ~150 Incremental jobs each night. I am configured for 8 concurrent jobs on the Storage Daemon. PROBLEM: The first job(s) grab one of the 2 devices available in the changer (which is set to AutoSelect) and either load a tape, or use a tape from the previous evening. All tapes in the changer are in the same Incremenal-Pool. The second jobs(s) grab the other of the 2 devices available in the changer, but want to use the same tape that's just been mounted (or put into use) on the jobs that got launched first. They will often literal wait the entire evening until 100's of jobs run through on only one device, until that tape is freed up, at which point it is unmounted from the first device and moved to the second. Note, the behaviour seems to be to round-robin my 8 concurrency limit between the 2 available drives, which mean 4 jobs will run, and 4 jobs will block on waiting for the wanted Volume. When the original 4 jobs are completed (not at the same time) additional jobs are launched that keep that wanted Volume in use. LOG: 03-Nov 22:00 DIRECTOR JobId 267433: Start Backup JobId 267433, Job=JOB. 2012-11-03_22.00.00_0403-Nov 22:00 DIRECTOR JobId 267433: Using Device L100-Drive-003-Nov 22:00 DIRECTOR JobId 267433: Sending Accurate information. 03-Nov 22:00 sd_L100_ JobId 267433: 3307 Issuing autochanger unload slot 82, drive 0 command. 03-Nov 22:06 lawson-sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: mount.c:217 Open device L100-Drive-0 (/dev/L100-Drive-0) Volume IM0108 failed: ERR=dev.c:513 Unable to open device L100-Drive-0 (/dev/L100-Drive-0): ERR=No medium found . . . CONFIGS (partial and seem pretty straight-forward): Schedule { Name = DefaultSchedule Run = Level=Incremental sat-thu at 22:00 Run = Level=Differential fri at 22:00 } JobDefs { Name = DefaultJob Type = Backup Level = Full Schedule = DefaultSchedule Incremental Backup Pool = Incremental-Pool Differential Backup Pool = Incremental-Pool } Pool { Name = Incremental-Pool Pool Type = Backup Storage = L100-changer } Storage { Name = L100-changer Device = L100-changer Media Type = LTO-3 Autochanger = yes Maximum Concurrent Jobs = 8 } Autochanger { Name = L100-changer Device = L100-Drive-0 Device = L100-Drive-1 Changer Device = /dev/L100-changer } Device { Name = L100-Drive-0 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-0 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } Device { Name = L100-Drive-1 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-1 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } I do not have a good solution but I know by default bacula does not want to load the same pool into more than 1 storage device at the same time. John I think it's something in the automated logic. Because if I launch jobs by hand (same pool across 2 tapes devices in same autochanger) everything works fine. I think it has more to do with the Scheduler assigning the same same Volume to all jobs and then not wanting to change that choice if that Volume is in use. When both jobs start at the same time and same priority, they see the same exact next available volume for the pool, and so both select the same volume. When they select different drives, it is a problem, since the volume can only be in one drive. When you start the jobs manually, I assume you are starting them at different times. This works, because the first job is up
Re: [Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1
No such luck. I already have Prefer Mounted Volumes = no set for all jobs. That's apparently not a solution. Stephen On 11/5/12 2:57 PM, Stephen Thompson wrote: Going to try this out. Stephen On 11/05/2012 02:40 PM, Josh Fisher wrote: On 11/5/2012 4:28 PM, Stephen Thompson wrote: On 11/05/2012 01:17 PM, Josh Fisher wrote: On 11/5/2012 11:03 AM, Stephen Thompson wrote: On 11/5/12 7:59 AM, John Drescher wrote: I've had the following problem for ages (meaning multiple major revisions of bacula) and I've seen this come up from time to time on the mailing list, but I've never actually seen a resolution (please point me to one if it's been found). background: I run monthly Fulls and nightly Incrementals. I have a 2 drive autochanger dedicated to my Incrementals. I launch something like ~150 Incremental jobs each night. I am configured for 8 concurrent jobs on the Storage Daemon. PROBLEM: The first job(s) grab one of the 2 devices available in the changer (which is set to AutoSelect) and either load a tape, or use a tape from the previous evening. All tapes in the changer are in the same Incremenal-Pool. The second jobs(s) grab the other of the 2 devices available in the changer, but want to use the same tape that's just been mounted (or put into use) on the jobs that got launched first. They will often literal wait the entire evening until 100's of jobs run through on only one device, until that tape is freed up, at which point it is unmounted from the first device and moved to the second. Note, the behaviour seems to be to round-robin my 8 concurrency limit between the 2 available drives, which mean 4 jobs will run, and 4 jobs will block on waiting for the wanted Volume. When the original 4 jobs are completed (not at the same time) additional jobs are launched that keep that wanted Volume in use. LOG: 03-Nov 22:00 DIRECTOR JobId 267433: Start Backup JobId 267433, Job=JOB. 2012-11-03_22.00.00_0403-Nov 22:00 DIRECTOR JobId 267433: Using Device L100-Drive-003-Nov 22:00 DIRECTOR JobId 267433: Sending Accurate information. 03-Nov 22:00 sd_L100_ JobId 267433: 3307 Issuing autochanger unload slot 82, drive 0 command. 03-Nov 22:06 lawson-sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: mount.c:217 Open device L100-Drive-0 (/dev/L100-Drive-0) Volume IM0108 failed: ERR=dev.c:513 Unable to open device L100-Drive-0 (/dev/L100-Drive-0): ERR=No medium found . . . CONFIGS (partial and seem pretty straight-forward): Schedule { Name = DefaultSchedule Run = Level=Incremental sat-thu at 22:00 Run = Level=Differential fri at 22:00 } JobDefs { Name = DefaultJob Type = Backup Level = Full Schedule = DefaultSchedule Incremental Backup Pool = Incremental-Pool Differential Backup Pool = Incremental-Pool } Pool { Name = Incremental-Pool Pool Type = Backup Storage = L100-changer } Storage { Name = L100-changer Device = L100-changer Media Type = LTO-3 Autochanger = yes Maximum Concurrent Jobs = 8 } Autochanger { Name = L100-changer Device = L100-Drive-0 Device = L100-Drive-1 Changer Device = /dev/L100-changer } Device { Name = L100-Drive-0 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-0 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } Device { Name = L100-Drive-1 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-1 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } I do not have a good solution but I know by default bacula does not want to load the same pool into more than 1 storage device at the same time. John I think it's something in the automated logic. Because if I launch jobs by hand (same pool across 2 tapes devices in same autochanger) everything works fine. I think it has more to do with the Scheduler assigning the same same Volume to all jobs and then not wanting to change that choice if that Volume is in use. When both jobs start at the same time and same priority, they see the same exact next available volume for the pool, and so both select the same volume. When they