Re: [Bacula-users] Backup of in one directory with 800.000 files
Hi! On Wed, Dec 3, 2008 at 10:46 AM, Tobias Bartel [EMAIL PROTECTED] wrote: Hello, sorry for my late response. I just finished migrating the catalog to a Postgre 8.1 Server and enabled spooling. Spooling space is 100GB, the horror directory only 70GB. I hope that both changes will give us a speed bump, well the Postgre should increase the performance and the spool dir should at least decrease the tapewear. 1. Did you optimized the PostgreSQL database to actually use the server? (if you stick with defaults, you will see a bad performance with many entries). 2. Please measure the speed of the spool space: if it is under 30MB/s, it is not good for your LTO-3 drive. I use these commands to give an approx of disk speed: dd if=/dev/zero of=cosa.img bs=10M count=100 conv=fsync (this gives write performance) dd if=cosa.img of=/dev/null bs=10M iflag=direct (read performance). I just started another test, lets see how long it will take ;) If everything is ok, it should take around 4 to 20 hours (max). I hope this helps, Ildefonso Camargo cya tobi Am Donnerstag, den 27.11.2008, 12:25 -0500 schrieb Ryan Novosielski: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tobias Bartel wrote: Hello, Even with 800,000 files, that sounds very slow. How much data is involved, how is it stored and how fast is your database server? It's about 70GB of data, stored on a Raid5 (3Ware controller). The database is a SQLite one, on the same machine but on a Software Raid 1. The backup device is an LTO3 connected via SCSI OS is a Debian stable. I already thought about moving the Database to MySQL but there is already a MySQL Server on the same box, it is a slave for our MySQL master and used for hourly Backups of our database (Stop the replication, do the backups and start the replication again). I don't really like the idea of adding a DB to the Slave that isn't on the master, nor do i like the idea of hacking up some custom MySQL install that runs parallel caus that will cost me with every future update. To be honest, i didn't expect that SQLite could be the bottle neck, it just can't be that slow. What made me think that its the number files, is that when i do an ls in that directory it takes ~15min before I see any output. My understanding is that you cannot expect decent performance out of SQLite for Bacula for any production level backup. I could be wrong here, but I say forget about SQLite for anything other than a trial, and definitely not using it for a backup that is extra demanding. You could use PostgreSQL if you wanted to avoid messing with the slave server (though something tells me that's not a major worry, but I am not sure about it), or just run MySQL on a different port which I don't think is all that hard (or, actually, use it in socket-only mode, which is even easier and I think would suffice). - -- _ _ _ _ ___ _ _ _ |Y#| | | |\/| | \ |\ | | |Ryan Novosielski - Systems Programmer II |$| |__| | | |__/ | \| _| |[EMAIL PROTECTED] - 973/972.0922 (2-0922) \__/ Univ. of Med. and Dent.|IST/AST - NJMS Medical Science Bldg - C630 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJLtgHmb+gadEcsb4RApbKAJ4gTg9fF8susc4iS6e44D9s7uWTxwCg2T/n hd0IuSIG6mg6J4FPrL/aRz8= =M8R0 -END PGP SIGNATURE- - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net
Re: [Bacula-users] Backup of in one directory with 800.000 files
Hello, thx for the suggestion, its defnitly worth a shot. Sooner or later we will need a script to sort the files anyway ;) cya tobi Am Freitag, den 28.11.2008, 04:50 -0800 schrieb Kevin Keane: This might be a totally off-the-wall idea, might not even work - or if it does, it may not help in the end. But how about doing your own subdirectorying. Leave all 800,000 files in the directory where they are - and create your own parallel directory structure with subdirectories as you like. For instance, one per day. Then write a script that creates links (maybe soft links will work, else use hard links) from the original files into your subdirectories. You will probably need to run this script daily. If you use hard links, you may be able to use the reference count for the file to find out which files have already been linked into your directory, so you can create new links only for those files with a reference count of 1. That way, the fax application will continue seeing the faxes where you expect them, and at the same time you have the files organized in a backup-friendly structure. Now instead of backing up the single huge directory, back up the directory structure you created yourself. Whether this will work, I don't know; I'm just trying to be creative. Maybe running this script and creating all the links will take 48 hours... Tobias Bartel wrote: Hello, i am tasked to set up daily full backups of our entire fax communication and they are all stored in one single director ;). There are about 800.000 files in that directory what makes accessing that directory extremely slow. The target device is a LTO3 tape drive with an 8 slots changer. With my current configuration Bacula needs ~48h to make a complete backup which kinda conflicts with the requirement of doing dailys. Does anybody have any suggestions on how i could speed things up? THX in advance - Tobi PS: I already talked to my boss and our developers and they will change the system so the faxes get stored in subdirectories. But changing that doesn't have a very high priority and got scheduled for next summer. - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Backup of in one directory with 800.000 files
Hello, sorry for my late response. I just finished migrating the catalog to a Postgre 8.1 Server and enabled spooling. Spooling space is 100GB, the horror directory only 70GB. I hope that both changes will give us a speed bump, well the Postgre should increase the performance and the spool dir should at least decrease the tapewear. I just started another test, lets see how long it will take ;) cya tobi Am Donnerstag, den 27.11.2008, 12:25 -0500 schrieb Ryan Novosielski: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tobias Bartel wrote: Hello, Even with 800,000 files, that sounds very slow. How much data is involved, how is it stored and how fast is your database server? It's about 70GB of data, stored on a Raid5 (3Ware controller). The database is a SQLite one, on the same machine but on a Software Raid 1. The backup device is an LTO3 connected via SCSI OS is a Debian stable. I already thought about moving the Database to MySQL but there is already a MySQL Server on the same box, it is a slave for our MySQL master and used for hourly Backups of our database (Stop the replication, do the backups and start the replication again). I don't really like the idea of adding a DB to the Slave that isn't on the master, nor do i like the idea of hacking up some custom MySQL install that runs parallel caus that will cost me with every future update. To be honest, i didn't expect that SQLite could be the bottle neck, it just can't be that slow. What made me think that its the number files, is that when i do an ls in that directory it takes ~15min before I see any output. My understanding is that you cannot expect decent performance out of SQLite for Bacula for any production level backup. I could be wrong here, but I say forget about SQLite for anything other than a trial, and definitely not using it for a backup that is extra demanding. You could use PostgreSQL if you wanted to avoid messing with the slave server (though something tells me that's not a major worry, but I am not sure about it), or just run MySQL on a different port which I don't think is all that hard (or, actually, use it in socket-only mode, which is even easier and I think would suffice). - -- _ _ _ _ ___ _ _ _ |Y#| | | |\/| | \ |\ | | |Ryan Novosielski - Systems Programmer II |$| |__| | | |__/ | \| _| |[EMAIL PROTECTED] - 973/972.0922 (2-0922) \__/ Univ. of Med. and Dent.|IST/AST - NJMS Medical Science Bldg - C630 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJLtgHmb+gadEcsb4RApbKAJ4gTg9fF8susc4iS6e44D9s7uWTxwCg2T/n hd0IuSIG6mg6J4FPrL/aRz8= =M8R0 -END PGP SIGNATURE- - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Backup of in one directory with 800.000 files
This might be a totally off-the-wall idea, might not even work - or if it does, it may not help in the end. But how about doing your own subdirectorying. Leave all 800,000 files in the directory where they are - and create your own parallel directory structure with subdirectories as you like. For instance, one per day. Then write a script that creates links (maybe soft links will work, else use hard links) from the original files into your subdirectories. You will probably need to run this script daily. If you use hard links, you may be able to use the reference count for the file to find out which files have already been linked into your directory, so you can create new links only for those files with a reference count of 1. That way, the fax application will continue seeing the faxes where you expect them, and at the same time you have the files organized in a backup-friendly structure. Now instead of backing up the single huge directory, back up the directory structure you created yourself. Whether this will work, I don't know; I'm just trying to be creative. Maybe running this script and creating all the links will take 48 hours... Tobias Bartel wrote: Hello, i am tasked to set up daily full backups of our entire fax communication and they are all stored in one single director ;). There are about 800.000 files in that directory what makes accessing that directory extremely slow. The target device is a LTO3 tape drive with an 8 slots changer. With my current configuration Bacula needs ~48h to make a complete backup which kinda conflicts with the requirement of doing dailys. Does anybody have any suggestions on how i could speed things up? THX in advance - Tobi PS: I already talked to my boss and our developers and they will change the system so the faxes get stored in subdirectories. But changing that doesn't have a very high priority and got scheduled for next summer. - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Kevin Keane Owner The NetTech Turn your NetWORRY into a NetWORK! Office: 866-642-7116 http://www.4nettech.com This e-mail and attachments, if any, may contain confidential and/or proprietary information. Please be advised that the unauthorized use or disclosure of the information is strictly prohibited. The information herein is intended only for use by the intended recipient(s) named above. If you have received this transmission in error, please notify the sender immediately and permanently delete the e-mail and any copies, printouts or attachments thereof. - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Backup of in one directory with 800.000 files
Tobias Bartel wrote: Hello, Even with 800,000 files, that sounds very slow. How much data is involved, how is it stored and how fast is your database server? It's about 70GB of data, stored on a Raid5 (3Ware controller). The database is a SQLite one, on the same machine but on a Software Raid 1. You obviously have a very performance-sensitive situation, and writing to a software raid can seriously affect performance. Any chance you can move the database to a hardware RAID? For that matter, ideally I would recommend completely eliminating the software raid from that machine if you can, or at least only put read-only data on it, such as the /usr directory. -- Kevin Keane Owner The NetTech Turn your NetWORRY into a NetWORK! Office: 866-642-7116 http://www.4nettech.com This e-mail and attachments, if any, may contain confidential and/or proprietary information. Please be advised that the unauthorized use or disclosure of the information is strictly prohibited. The information herein is intended only for use by the intended recipient(s) named above. If you have received this transmission in error, please notify the sender immediately and permanently delete the e-mail and any copies, printouts or attachments thereof. - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Backup of in one directory with 800.000 files
On Thu, 27 Nov 2008, Tobias Bartel wrote: The database is a SQLite one, on the same machine but on a Software Raid 1. SQLite is really oinly intended for testing, not production systems. Switch to postgres or mysql, things should improve. - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] Backup of in one directory with 800.000 files
Hello, i am tasked to set up daily full backups of our entire fax communication and they are all stored in one single director ;). There are about 800.000 files in that directory what makes accessing that directory extremely slow. The target device is a LTO3 tape drive with an 8 slots changer. With my current configuration Bacula needs ~48h to make a complete backup which kinda conflicts with the requirement of doing dailys. Does anybody have any suggestions on how i could speed things up? THX in advance - Tobi PS: I already talked to my boss and our developers and they will change the system so the faxes get stored in subdirectories. But changing that doesn't have a very high priority and got scheduled for next summer. - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Backup of in one directory with 800.000 files
Tobias Bartel wrote: Hello, i am tasked to set up daily full backups of our entire fax communication and they are all stored in one single director ;). There are about 800.000 files in that directory what makes accessing that directory extremely slow. The target device is a LTO3 tape drive with an 8 slots changer. With my current configuration Bacula needs ~48h to make a complete backup which kinda conflicts with the requirement of doing dailys. Does anybody have any suggestions on how i could speed things up? Even with 800,000 files, that sounds very slow. How much data is involved, how is it stored and how fast is your database server? -- James Cort IT Manager U4EA Technologies Ltd. -- U4EA Technologies http://www.u4eatech.com - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Backup of in one directory with 800.000 files
Hi, 27.11.2008 17:10, Tobias Bartel wrote: Hello, i am tasked to set up daily full backups of our entire fax communication and they are all stored in one single director ;). There are about 800.000 files in that directory what makes accessing that directory extremely slow. The target device is a LTO3 tape drive with an 8 slots changer. With my current configuration Bacula needs ~48h to make a complete backup which kinda conflicts with the requirement of doing dailys. Does anybody have any suggestions on how i could speed things up? A more or less identical question is discussed in the thread Large maildir backup right now. Without further details: Database tuning and using a file system better handling many files in a directory are the ideas right now. Regarding file systems, xfs works here quite well and is said to be suitable for large directories. Arno THX in advance - Tobi PS: I already talked to my boss and our developers and they will change the system so the faxes get stored in subdirectories. But changing that doesn't have a very high priority and got scheduled for next summer. Then tell them that daily full backups of the fax system can be scheduled for next autumn ;-) Arno -- Arno Lehmann IT-Service Lehmann Sandstr. 6, 49080 Osnabrück www.its-lehmann.de - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Backup of in one directory with 800.000 files
Tobias Bartel wrote: Hello, i am tasked to set up daily full backups of our entire fax communication and they are all stored in one single director ;). There are about 800.000 files in that directory what makes accessing that directory extremely slow. The target device is a LTO3 tape drive with an 8 slots changer. What's the average file size? I may be that you're simply hitting the filesystems performance. Try measuring the speed with tar time tar cf /dev/null /path/to/files Alternatively: tar cf - /path/to/file | pv /dev/null pv is a pipe that can measure throughput. I've tried with some of the filesystems I have .. they can go all the way down to 2-5MB/s when there are a huge amount of really small files. They you get a rough feeling about the time spend just by reading the files off disk. (if you spool data to disk before tape, then you should add in some time for that). With my current configuration Bacula needs ~48h to make a complete backup which kinda conflicts with the requirement of doing dailys. Does anybody have any suggestions on how i could speed things up? Try to find out where the bottleneck is in your system. It may be the catalog that's too slow, it may be that you should disable spooling. PS: I already talked to my boss and our developers and they will change the system so the faxes get stored in subdirectories. But changing that doesn't have a very high priority and got scheduled for next summer. Based on above investigations it may be that they should do it sooner rather than later. Also read the thread about the Large Maildir, thats basically the same issues. -- Jesper - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Backup of in one directory with 800.000 files
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tobias Bartel wrote: Hello, Even with 800,000 files, that sounds very slow. How much data is involved, how is it stored and how fast is your database server? It's about 70GB of data, stored on a Raid5 (3Ware controller). The database is a SQLite one, on the same machine but on a Software Raid 1. The backup device is an LTO3 connected via SCSI OS is a Debian stable. I already thought about moving the Database to MySQL but there is already a MySQL Server on the same box, it is a slave for our MySQL master and used for hourly Backups of our database (Stop the replication, do the backups and start the replication again). I don't really like the idea of adding a DB to the Slave that isn't on the master, nor do i like the idea of hacking up some custom MySQL install that runs parallel caus that will cost me with every future update. To be honest, i didn't expect that SQLite could be the bottle neck, it just can't be that slow. What made me think that its the number files, is that when i do an ls in that directory it takes ~15min before I see any output. My understanding is that you cannot expect decent performance out of SQLite for Bacula for any production level backup. I could be wrong here, but I say forget about SQLite for anything other than a trial, and definitely not using it for a backup that is extra demanding. You could use PostgreSQL if you wanted to avoid messing with the slave server (though something tells me that's not a major worry, but I am not sure about it), or just run MySQL on a different port which I don't think is all that hard (or, actually, use it in socket-only mode, which is even easier and I think would suffice). - -- _ _ _ _ ___ _ _ _ |Y#| | | |\/| | \ |\ | | |Ryan Novosielski - Systems Programmer II |$| |__| | | |__/ | \| _| |[EMAIL PROTECTED] - 973/972.0922 (2-0922) \__/ Univ. of Med. and Dent.|IST/AST - NJMS Medical Science Bldg - C630 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJLtgHmb+gadEcsb4RApbKAJ4gTg9fF8susc4iS6e44D9s7uWTxwCg2T/n hd0IuSIG6mg6J4FPrL/aRz8= =M8R0 -END PGP SIGNATURE- begin:vcard fn:Ryan Novosielski n:Novosielski;Ryan org:UMDNJ;IST/AST adr;dom:MSB C630;;185 South Orange Avenue;Newark;NJ;07103 email;internet:[EMAIL PROTECTED] title:Systems Programmer II tel;work:(973) 972-0922 tel;fax:(973) 972-7412 tel;pager:(866) 20-UMDNJ x-mozilla-html:FALSE version:2.1 end:vcard - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Backup of in one directory with 800.000 files
Tobias Bartel wrote: Even with 800,000 files, that sounds very slow. How much data is involved, how is it stored and how fast is your database server? It's about 70GB of data, stored on a Raid5 (3Ware controller). The database is a SQLite one, on the same machine but on a Software Raid 1. The backup device is an LTO3 connected via SCSI OS is a Debian stable. I already thought about moving the Database to MySQL but there is already a MySQL Server on the same box, it is a slave for our MySQL master and used for hourly Backups of our database (Stop the replication, do the backups and start the replication again). I don't really like the idea of adding a DB to the Slave that isn't on the master, nor do i like the idea of hacking up some custom MySQL install that runs parallel caus that will cost me with every future update. Perhaps a Postgres on the same host? To be honest, i didn't expect that SQLite could be the bottle neck, it just can't be that slow. What made me think that its the number files, is that when i do an ls in that directory it takes ~15min before I see any output. That more likely to be ls playing tricks with you.. Try: ls -f | head (or just ls -f) -- Jesper - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users