Re: [Bacula-users] Catalogue snapshot utility : any interest?
On 04/10/10, James Harper (james.har...@bendigoit.com.au) wrote: On 04/10/10, James Harper (james.har...@bendigoit.com.au) wrote: A full pg_dump of the catalogue is 2.8G. The output of the catalogue snapshot for job 60 is 1.6G. Naturally, the full pg_dump of the whole database will continue to grow over time. I'm a little suprised that the proportion of job 60 to the whole is so high. Job 60 is similar to job 1, but I don't expect they share much information. I'll have to look into that. If jobid 60 and job 1 were the same backup job then a lot of the information may be shared in the filename table. Even if they are backups of similar servers then they will share a lot of filename data and that filename data has to come with the extracted catalogue so you might not be saving that much. My backups are all full backups. Also, the key file table in postgres (which joins files and paths) is job specific, so I'm not sure where any duplication is emanating from. Regards Rory Table public.file Column | Type | Modifiers +-+--- fileid | bigint | not null default nextval('file_fileid_seq'::regclass) fileindex | integer | not null default 0 jobid | integer | not null pathid | integer | not null filenameid | integer | not null markid | integer | not null default 0 lstat | text| not null md5| text| not null -- Rory Campbell-Lange r...@campbell-lange.net Campbell-Lange Workshop www.campbell-lange.net 0207 6311 555 3 Tottenham Street London W1T 2AF Registered in England No. 04551928 -- Virtualization is moving to the mainstream and overtaking non-virtualized environment for deploying applications. Does it make network security easier or more difficult to achieve? Read this whitepaper to separate the two and get a better understanding. http://p.sf.net/sfu/hp-phase2-d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Catalogue snapshot utility : any interest?
On 04/10/10, James Harper (james.har...@bendigoit.com.au) wrote: I have developed a catalogue snapshot facility in python to snapshot one job's catalogue and dump it to disk. ... How much smaller is the catalogue subset vs the full catalogue? Good question. I'm not able to answer that question fully at present as I don't have enough jobs in my current database to know. My currrent database has the following jobs in it: jobid | jobfiles | jobgigs ---+--+- 1 | 7706717 | 6833.90 8 | 3965507 | 4480.83 9 | 1273459 | 129.87 50 | 646336 | 512.07 60 | 7845561 | 6990.67 A full pg_dump of the catalogue is 2.8G. The output of the catalogue snapshot for job 60 is 1.6G. Naturally, the full pg_dump of the whole database will continue to grow over time. (The job 60 cataloge file compresses to about 300MB with bzip2 -9). I'm a little suprised that the proportion of job 60 to the whole is so high. Job 60 is similar to job 1, but I don't expect they share much information. I'll have to look into that. If jobid 60 and job 1 were the same backup job then a lot of the information may be shared in the filename table. Even if they are backups of similar servers then they will share a lot of filename data and that filename data has to come with the extracted catalogue so you might not be saving that much. James -- Virtualization is moving to the mainstream and overtaking non-virtualized environment for deploying applications. Does it make network security easier or more difficult to achieve? Read this whitepaper to separate the two and get a better understanding. http://p.sf.net/sfu/hp-phase2-d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Catalogue snapshot utility : any interest?
On 10/04/10 07:22, Rory Campbell-Lange wrote: I have developed a catalogue snapshot facility in python to snapshot one job's catalogue and dump it to disk. The snapshot provides a bacula database schema file, a database dump of the job's data, and a file listing of files showing info such as the tape number, path, file, md5 and lstat. We intend to include the catalogue in compressed format on CDs accompanying tape sets to assist our clients retrieve data in future if required. At present the system works only for Postgresql, and for our setup which has the director, storage and file daemons on the same Linux server. How it works: * A temporary schema is made in postgres, named job_%d % (jobid) * Relevant data is selected from the public schema to the temporary schema * The file listing is ouput * The public schema is dumped * The temporary schema is dumped * The temporary schema is removed I'm considering making an sqlite database from the temporary schema to obviate the need for the public schema file and file listing. This is fairly simple stuff, but if this functionality is useful to you, do let me know and I can share the programme with you. This sounds like a useful tool for any Bacula site that's managing Bacula backups for a large number of clients. -- Phil Stracchino, CDK#2 DoD#299792458 ICBM: 43.5607, -71.355 ala...@caerllewys.net ala...@metrocast.net p...@co.ordinate.org Renaissance Man, Unix ronin, Perl hacker, Free Stater It's not the years, it's the mileage. -- Virtualization is moving to the mainstream and overtaking non-virtualized environment for deploying applications. Does it make network security easier or more difficult to achieve? Read this whitepaper to separate the two and get a better understanding. http://p.sf.net/sfu/hp-phase2-d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Catalogue snapshot utility : any interest?
On 04/10/10, James Harper (james.har...@bendigoit.com.au) wrote: I have developed a catalogue snapshot facility in python to snapshot one job's catalogue and dump it to disk. ... How much smaller is the catalogue subset vs the full catalogue? Good question. I'm not able to answer that question fully at present as I don't have enough jobs in my current database to know. My currrent database has the following jobs in it: jobid | jobfiles | jobgigs ---+--+- 1 | 7706717 | 6833.90 8 | 3965507 | 4480.83 9 | 1273459 | 129.87 50 | 646336 | 512.07 60 | 7845561 | 6990.67 A full pg_dump of the catalogue is 2.8G. The output of the catalogue snapshot for job 60 is 1.6G. Naturally, the full pg_dump of the whole database will continue to grow over time. (The job 60 cataloge file compresses to about 300MB with bzip2 -9). I'm a little suprised that the proportion of job 60 to the whole is so high. Job 60 is similar to job 1, but I don't expect they share much information. I'll have to look into that. Regards Rory -- Rory Campbell-Lange r...@campbell-lange.net Campbell-Lange Workshop www.campbell-lange.net 0207 6311 555 3 Tottenham Street London W1T 2AF Registered in England No. 04551928 -- Virtualization is moving to the mainstream and overtaking non-virtualized environment for deploying applications. Does it make network security easier or more difficult to achieve? Read this whitepaper to separate the two and get a better understanding. http://p.sf.net/sfu/hp-phase2-d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] Catalogue snapshot utility : any interest?
I have developed a catalogue snapshot facility in python to snapshot one job's catalogue and dump it to disk. The snapshot provides a bacula database schema file, a database dump of the job's data, and a file listing of files showing info such as the tape number, path, file, md5 and lstat. We intend to include the catalogue in compressed format on CDs accompanying tape sets to assist our clients retrieve data in future if required. At present the system works only for Postgresql, and for our setup which has the director, storage and file daemons on the same Linux server. How it works: * A temporary schema is made in postgres, named job_%d % (jobid) * Relevant data is selected from the public schema to the temporary schema * The file listing is ouput * The public schema is dumped * The temporary schema is dumped * The temporary schema is removed I'm considering making an sqlite database from the temporary schema to obviate the need for the public schema file and file listing. This is fairly simple stuff, but if this functionality is useful to you, do let me know and I can share the programme with you. Regards Rory -- Rory Campbell-Lange r...@campbell-lange.net Campbell-Lange Workshop www.campbell-lange.net 0207 6311 555 3 Tottenham Street London W1T 2AF Registered in England No. 04551928 -- Virtualization is moving to the mainstream and overtaking non-virtualized environment for deploying applications. Does it make network security easier or more difficult to achieve? Read this whitepaper to separate the two and get a better understanding. http://p.sf.net/sfu/hp-phase2-d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Catalogue snapshot utility : any interest?
On 04/10/10, Phil Stracchino (ala...@metrocast.net) wrote: On 10/04/10 07:22, Rory Campbell-Lange wrote: I have developed a catalogue snapshot facility in python to snapshot one job's catalogue and dump it to disk. The snapshot provides a bacula database schema file, a database dump of the job's data, and a file listing of files showing info such as the tape number, path, file, md5 and lstat. ... This is fairly simple stuff, but if this functionality is useful to you, do let me know and I can share the programme with you. This sounds like a useful tool for any Bacula site that's managing Bacula backups for a large number of clients. Hi Phil I'd be delighted if you could take a look at the python script and for your comments. It is part of the small .tgz archive here: http://campbell-lange.net/media/files/bacula_tools_01.tgz Please **do not** run it on a production Postgresql database. Note that big backups (one with more than 7 million files, say) may take up to 45 minutes to process. If you are able to get the system to operate and you think it is useful I'll stick the script on Bitbucket. Regards Rory -- Rory Campbell-Lange r...@campbell-lange.net Campbell-Lange Workshop www.campbell-lange.net 0207 6311 555 3 Tottenham Street London W1T 2AF Registered in England No. 04551928 -- Virtualization is moving to the mainstream and overtaking non-virtualized environment for deploying applications. Does it make network security easier or more difficult to achieve? Read this whitepaper to separate the two and get a better understanding. http://p.sf.net/sfu/hp-phase2-d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Catalogue snapshot utility : any interest?
On 10/04/10 08:01, Rory Campbell-Lange wrote: Hi Phil I'd be delighted if you could take a look at the python script and for your comments. I really can't help with testing it, sorry. I don't run PostgreSQL and don't speak Python. ;) -- Phil Stracchino, CDK#2 DoD#299792458 ICBM: 43.5607, -71.355 ala...@caerllewys.net ala...@metrocast.net p...@co.ordinate.org Renaissance Man, Unix ronin, Perl hacker, Free Stater It's not the years, it's the mileage. -- Virtualization is moving to the mainstream and overtaking non-virtualized environment for deploying applications. Does it make network security easier or more difficult to achieve? Read this whitepaper to separate the two and get a better understanding. http://p.sf.net/sfu/hp-phase2-d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Catalogue snapshot utility : any interest?
I have developed a catalogue snapshot facility in python to snapshot one job's catalogue and dump it to disk. The snapshot provides a bacula database schema file, a database dump of the job's data, and a file listing of files showing info such as the tape number, path, file, md5 and lstat. We intend to include the catalogue in compressed format on CDs accompanying tape sets to assist our clients retrieve data in future if required. At present the system works only for Postgresql, and for our setup which has the director, storage and file daemons on the same Linux server. How it works: * A temporary schema is made in postgres, named job_%d % (jobid) * Relevant data is selected from the public schema to the temporary schema * The file listing is ouput * The public schema is dumped * The temporary schema is dumped * The temporary schema is removed I'm considering making an sqlite database from the temporary schema to obviate the need for the public schema file and file listing. This is fairly simple stuff, but if this functionality is useful to you, do let me know and I can share the programme with you. How much smaller is the catalogue subset vs the full catalogue? James -- Virtualization is moving to the mainstream and overtaking non-virtualized environment for deploying applications. Does it make network security easier or more difficult to achieve? Read this whitepaper to separate the two and get a better understanding. http://p.sf.net/sfu/hp-phase2-d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users