Re: [Bacula-users] Virtual tapes or virtual disks
On 2022-01-26 20:13, dmitri maziuk wrote: On 2022-01-26 12:57 PM, Josip Deanovic wrote: The number of files per directory is far bigger and is unlikely to get reached, especially not for this use case. The limit is one thing, the scaling is another. I agree: 40TB of 10GB files is not enough to see the slow-down on any modern system, you'd need an order of magnitude more files to get there. Still it's something to be aware of when deciding on volume size. 40 TB is 40960 GB which would give 4096 files, 10 GB in size. Order of magnitude would be 40960 files which is still nothing. Right now on my laptop I have 291794 files and 34481 directories and that's only under /usr. I had systems with hundreds of millions of files on UFS2 (FreeBSD) and systems with billions of files on ext3 (Linux) and that was like 15 years ago. As far as I can remember there were no issues with read/write performance related to the number of files. The issue was backup which would take a lot of time to traverse the whole file system. This is a problem common to all hierarchical databases without some kind of indexing employed to deal with the issue. As long the full path of a file is known, I don't think the read/write performance of a file would change noticeably with the increase of number of files on the file system. Modern file systems are using directory indexing so even searching through a file system doesn't take too long but it's common sense that the time needed to perform a lookup would increase (not necessary linearly) with the number of files on the file system. In any case, Bacula knows the path names of the file volumes and doesn't need to search the file system. I can't imagine the setup where the number of files on the local file system containing Bacula file volumes would pose a problem. Regards! -- Josip Deanovic ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Virtual tapes or virtual disks
On 2022-01-26 12:57 PM, Josip Deanovic wrote: The number of files per directory is far bigger and is unlikely to get reached, especially not for this use case. The limit is one thing, the scaling is another. I agree: 40TB of 10GB files is not enough to see the slow-down on any modern system, you'd need an order of magnitude more files to get there. Still it's something to be aware of when deciding on volume size. Dima ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Virtual tapes or virtual disks
I'm having a RAID5 array of about 40TB in size. A separate RAID controller card handles the disks. I'm planning to use the normal ext4 file system. It's standard and well known, most probably not the fastest though. That will not have any great impact, as there is a 4TB NVMe SSD drive, which takes the odd of the slow physical disk performance. Hi, I'd recommend if you're going to use RAID that you at least use a RAID-6 configuration. You don't want to risk losing all your backups if you have a drive fail and then during the rebuilding of the RAID-5, you happen to have another drive failure/error. cheers, --tom ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Virtual tapes or virtual disks
On 2022-01-26 18:42, dmitri maziuk wrote: If you use actual disks as "magazines" with vchanger, you need to pre-label the volumes. If you use just one big filesystem, you can let bacula do it for you (last I looked that functionality didn't work w/ autochangers). If you use disk "magazines" you also need to consider the whole-disk failure. If you use one big filesystem, use RAID (of course) to guard against those. But then you should look at the number of file volumes: some filesystems handle large numbers of directory entries better than others and you may want to balance the volume file size vs the number of directory entries. Regarding the number of directory entries... It is common to see the file system limit of 32000 directories per directory. The number of files per directory is far bigger and is unlikely to get reached, especially not for this use case. Regards! -- Josip Deanovic ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Virtual tapes or virtual disks
On 2022-01-26 11:59 AM, Peter Milesson via Bacula-users wrote: ... I'm having a RAID5 array of about 40TB in size. A separate RAID controller card handles the disks. I'm planning to use the normal ext4 file system. It's standard and well known, most probably not the fastest though. That will not have any great impact, as there is a 4TB NVMe SSD drive, which takes the odd of the slow physical disk performance. Yeah, we gave up on hardware RAID controllers long ago, but YMMV. As for SSDs, if you spool the jobs you can run them in parallel to spool->volume stream. You'd have to look at the numbers for your setup but generally despooling off the SSD over the bus runs just fine while clients are spooling to it over the network. Dima ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Virtual tapes or virtual disks
On 2022-01-26 18:06, Peter Milesson via Bacula-users wrote: I'm used to fixed volume sizes from the tape drives, I feel comfortable with it, and I do not need to relearn a lot to configure the Bacula system. The only thing I haven't found out is how to preallocate the number of volumes needed. Maybe there is no need, if the volumes are created automagically. Most of the RAID array will be used by Bacula, just leaving a couple of percent as free space. When using mhvtl, I started a script with the tape size and number of tapes I wanted, and the corresponding tape directories and volumes were created on the fly. Thanks Josip! You are welcome. I would like to point out that different requirements people may have will dictate different approaches. Regarding preallocation of the voluems, if there is a way to do it I am not aware of it. However, if you define maximum volume size and the maximum number of volumes in the pool, you should be able to calculate the space needed. Just leave some free space like 2x size of a volume and you should be good. Later, when you use all the volumes you will see if there is enough space to create yet another volume. You can chose to label volumes by yourself or leave that to Bacula. It's up to you. If you intend to recycle your volumes automatically, make sure that your retention periods are short enough to expire before all the volumes are used. Otherwise Bacula will not be able to perform backup. The alternative would be to force the recycle of the oldest volume but this doesn't happen by default, this option must be explicitly turned on. Regards! -- Josip Deanovic ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Virtual tapes or virtual disks
On 26.01.2022 18:42, dmitri maziuk wrote: On 2022-01-26 11:06 AM, Peter Milesson via Bacula-users wrote: ... Your way of explaining the reasoning of why to use smaller file volumes, is very appreciated. ... The only thing I haven't found out is how to preallocate the number of volumes needed. Maybe there is no need, if the volumes are created automagically. Most of the RAID array will be used by Bacula, just leaving a couple of percent as free space. If you use actual disks as "magazines" with vchanger, you need to pre-label the volumes. If you use just one big filesystem, you can let bacula do it for you (last I looked that functionality didn't work w/ autochangers). If you use disk "magazines" you also need to consider the whole-disk failure. If you use one big filesystem, use RAID (of course) to guard against those. But then you should look at the number of file volumes: some filesystems handle large numbers of directory entries better than others and you may want to balance the volume file size vs the number of directory entries. For single filesystem, I suggest using ZFS instead of a traditional RAID if you can: you can later grow it on-line by replacing disks w/ bigger ones when (not if) you need to. Dima Thanks for your input Dima. I'm having a RAID5 array of about 40TB in size. A separate RAID controller card handles the disks. I'm planning to use the normal ext4 file system. It's standard and well known, most probably not the fastest though. That will not have any great impact, as there is a 4TB NVMe SSD drive, which takes the odd of the slow physical disk performance. Best regards, Peter ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Virtual tapes or virtual disks
On 2022-01-26 11:06 AM, Peter Milesson via Bacula-users wrote: ... Your way of explaining the reasoning of why to use smaller file volumes, is very appreciated. ... The only thing I haven't found out is how to preallocate the number of volumes needed. Maybe there is no need, if the volumes are created automagically. Most of the RAID array will be used by Bacula, just leaving a couple of percent as free space. If you use actual disks as "magazines" with vchanger, you need to pre-label the volumes. If you use just one big filesystem, you can let bacula do it for you (last I looked that functionality didn't work w/ autochangers). If you use disk "magazines" you also need to consider the whole-disk failure. If you use one big filesystem, use RAID (of course) to guard against those. But then you should look at the number of file volumes: some filesystems handle large numbers of directory entries better than others and you may want to balance the volume file size vs the number of directory entries. For single filesystem, I suggest using ZFS instead of a traditional RAID if you can: you can later grow it on-line by replacing disks w/ bigger ones when (not if) you need to. Dima ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Virtual tapes or virtual disks
On 26.01.2022 0:02, Josip Deanovic wrote: On 23.01.2022 11:37, Radosław Korzeniewski wrote: Hello, pt., 21 sty 2022 o 14:22 Peter Milesson via Bacula-users napisał(a): If somebody has got experience with disk based, multi volume Bacula backup, I would be grateful about some information (tips, what to expect, pitfalls, etc.). The best IMVHO (but not the only mine) is to configure one job = one volume. You will get no real benefit to limit the size of a single volume. In the single volume = single job configuration you can set up job retention very easily as purging a volume will purge a single job only. It is not required to "wait" a particular volume to fill up to start retention. Purging a volume affects a single job only. And finally you end up with a way less number of volumes then when limiting its size to i.e. 10G. There are many different approaches which can fit different requirements. I don't see the benefit of having a single job per volume as Bacula is tracking media, files, jobs and everything else. That's why Bacula has a catalog which allows the backup system to determine the location and state of volumes, jobs, files, etc. To logically separate backup data I use pools and leave the rest to Bacula. When Bacula needs a particular file volume, if it's available Bacula will simply use it and if it's not or if we are using tape volume which is currently not in the tape drive/library, Bacula will ask for the volume by name. The number of smaller file volumes (e.g. 10GB) is not an issue as Bacula is handling them correctly and automatically (provided that Bacula is correctly configured, of course). I'll go through few examples where smaller file volumes (e.g. 10GB) could prove useful: 1. If the catalog database get corrupted or completely lost, due to the the small size, it's easier and faster to handle and determine volumes which contain database backup. That makes the process of importing the data into a new catalog database using a tool such as bscan easier. 2. Similar to 1), it is easier to manage small file volumes and extract particular jobs from a volume using bextract tool. 3. If the space is an issue (as it usually is), bigger volumes tend to eat more space which cannot be reused (volume cannot be recycled) as long as the volume contains a single job we want to preserve. 4. Although I don't like that approach, sometimes people chose to sync or copy whole file volumes to a secondary location using the usual tools such as rsync, cp and similar. In such case it is better to keep file volumes small. 5. When recycling a file volume, it will take longer time to wipe bigger file volume. If a volume is smaller it will take less time to recycle ensuring more time windows where other tasks could benefit from I/O performance. In case of large file volumes all other tasks would have to fight for the opportunity to access the file system and that gets more obvious when a slow network file system is being used. 6. In case of any kind of corruption of a file volume due to the file system corruption or damage in transport, it is likely that less data will be lost in case of a smaller file volume. And again, it's easier to handle smaller file volume when trying to recover pieces of data. Regards! Great Post! Your way of explaining the reasoning of why to use smaller file volumes, is very appreciated. The truth is, most files are fairly small. Particularly files created by office users. They range from a few kbytes up to some tens of megabytes. Videos can be huge, but I guess most companies handle instruction videos and similar, and not full blown movies. This type of content very seldom exceed 1GB. So a 10Gbyte volume limit seems to be a good balance. I'm used to fixed volume sizes from the tape drives, I feel comfortable with it, and I do not need to relearn a lot to configure the Bacula system. The only thing I haven't found out is how to preallocate the number of volumes needed. Maybe there is no need, if the volumes are created automagically. Most of the RAID array will be used by Bacula, just leaving a couple of percent as free space. When using mhvtl, I started a script with the tape size and number of tapes I wanted, and the corresponding tape directories and volumes were created on the fly. Thanks Josip! Best regards, Peter ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] upgrade from 9.4.2 to 11.0.5
Hello, Simple question, On Debian 11, for an upgrade from v9.4.2 mysql + baculum to v11, is there any difficulties to do it from sources. I always waited to the Debian repository version and they still are on 9.4.2... Should I upgrade to 9.6 before? has anybody already do it right? I think I must upgrade the fd client on windows systems too? ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users