Re: [Bacula-users] Adding missing foreign keys to PostgreSQL - avoiding dbcheck
Hello Dan, On 10/3/23 21:01, Dan Langille wrote: On Tue, Oct 3, 2023, at 1:45 PM, Eric Bollengier via Bacula-users wrote: Hello Dan, On 9/30/23 15:54, Dan Langille wrote: Basically, each time you insert something, you need to query the other tables to check the existence or not of records. The checks for the existence of the records is already done at the bacula level one time per job, no real need to do it one time per insertion/update/delete. Is that one-time-per-job check done via a SELECT on the database? I should explain analyze that one too and see if we can optimize that. Look the code, we get/create the fileset record, pool record, client, device, ... records when we start a job or load the configuration. For Path, the query itself ensures that we have the proper records, it's builtin. For example, one index I have been using this index for years. I find it referenced[1] in the the 5.x documentation, but it is not part of the catalog creation. "file_jobid_idx" btree (jobid) This index vastly improves the construction of the files, often going from hours to seconds. I don't recall when that index was added here, but building trees has never been an issue here. This index is great, but it uses GB of space and it has a cost during the insertion of the attributes. PostgreSQL can use the composed index on JobId, PathId instead to locate records, so you may add it back if you want, but it's not mandatory. How many GB? Do we know? We can easily find out too. bacula=# select pg_table_size('file_jobid_idx'); pg_table_size --- 37937438720 (1 row) That's about 40GB. I'll check the explain analyse and see how things go and come up with cost. Right now, we have no data. Some might say: if the cost > 0 it's not worth it. However, it is worth it, IMO. Databases are much better at enforcing relational integrity than code is. It's what they do. When Marc Cousin and I have worked on the Batch insert procedures, we have tested all these scenario including the FKEY, it was back in 2006 maybe. PostgreSQL has done a lot of progress since and SSD are replacing spinning disks for example, so it might be different, hard to say without a deep analysis with a large data set. Our complete analysis and wide range of tests that we did for a couple of months ended with the code we have today. Best Regards, Eric ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Adding missing foreign keys to PostgreSQL - avoiding dbcheck
On Tue, Oct 3, 2023, at 1:45 PM, Eric Bollengier via Bacula-users wrote: > Hello Dan, > > On 9/30/23 15:54, Dan Langille wrote: >> Hello, >> >> The Bacula PostgreSQL schema is missing several foreign keys (FK). Foreign >> keys are not a new database concept; they've been around for decades. They >> are reliable and robust. > > Yes, it's robust, but Bacula is doing mass insertion, and here, the cost is > bigger than the benefits IMHO. Of course, if someone can run tests on a > multi-billion record database and show us that the cost is around few %, we > can > consider them. Here's hoping someone with that database can step up. > Basically, each time you insert something, you need to query the other tables > to > check the existence or not of records. The checks for the existence of the > records is already done at the bacula level one time per job, no real need to > do > it one time per insertion/update/delete. Is that one-time-per-job check done via a SELECT on the database? I should explain analyze that one too and see if we can optimize that. >> Wednesday, I started a dbcheck on a Bacula database. Granted, that database >> is 19 years old and this is the first time I've run dbcheck (as far as I >> know). That dbcheck is still going. FYI, the dump to disk is about 140GB; >> lots of cruft removal. >> >> When PostgreSQL was first added to Bacula, there was resistance to FK, and I >> did not pursue the issue. Thus, it persists to this day. I hope to change >> that. > > I don't think they are necessary, we use transaction in the cleanup now days, > so > orphan records should not exist anymore. > >> I would like to take that development work back up (pun intended), and start >> adding foreign keys back into Bacula, at least for PostgresQL. That might >> remove the need for dbcheck (again, at least for Bacula on PostgreSQL). > > dbcheck is needed from time to time to purge the Path table mostly, other > tables > should be ok with a recent version. Not sure the algorithms in dbcheck are > always the best ones to find a remove records. > >> For example, one index I have been using this index for years. I find it >> referenced[1] in the the 5.x documentation, but it is not part of the >> catalog creation. >> >> "file_jobid_idx" btree (jobid) >> >> This index vastly improves the construction of the files, often going from >> hours to seconds. I don't recall when that index was added here, but >> building trees has never been an issue here. > > This index is great, but it uses GB of space and it has a cost during the > insertion of the attributes. PostgreSQL can use the composed index on JobId, > PathId instead to locate records, so you may add it back if you want, but it's > not mandatory. How many GB? Do we know? We can easily find out too. bacula=# select pg_table_size('file_jobid_idx'); pg_table_size --- 37937438720 (1 row) That's about 40GB. I'll check the explain analyse and see how things go and come up with cost. Right now, we have no data. Some might say: if the cost > 0 it's not worth it. However, it is worth it, IMO. Databases are much better at enforcing relational integrity than code is. It's what they do. Thank you -- Dan Langille d...@langille.org ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Adding missing foreign keys to PostgreSQL - avoiding dbcheck
On Tue, Oct 3, 2023, at 1:31 PM, Martin Simmons wrote: >> On Tue, 03 Oct 2023 09:35:27 -0400, Dan Langille said: >> >> On Mon, Oct 2, 2023, at 11:17 AM, Martin Simmons wrote: >> >> On Sat, 30 Sep 2023 09:54:51 -0400, Dan Langille said: >> >> >> >> Hello, >> >> >> >> The Bacula PostgreSQL schema is missing several foreign keys (FK). >> >> Foreign keys are not a new database concept; they've been around for >> >> decades. They are reliable and robust. >> >> >> >> Wednesday, I started a dbcheck on a Bacula database. Granted, that >> >> database is 19 years old and this is the first time I've run dbcheck (as >> >> far as I know). That dbcheck is still going. FYI, the dump to disk is >> >> about 140GB; lots of cruft removal. >> >> >> >> When PostgreSQL was first added to Bacula, there was resistance to FK, >> >> and I did not pursue the issue. Thus, it persists to this day. I hope to >> >> change that. >> >> >> >> I would like to take that development work back up (pun intended), and >> >> start adding foreign keys back into Bacula, at least for PostgresQL. That >> >> might remove the need for dbcheck (again, at least for Bacula on >> >> PostgreSQL). >> > >> > What is the performance cost of foreign keys? >> >> I'm replying so it does not appear as if I am ignoring you. Short answer: I >> don't know. Yet. That is the purpose of my project. > > OK, fair enough. > >> >> I can't answer that in a way which would sound satisfying. I have not >> started the work. I have only my personal experience - My backups seem fast >> enough to me. >> >> It is easy enough to test. There are several ways to optimize foreign keys >> usage. >> >> >> For example, one index I have been using this index for years. I find it >> >> referenced[1] in the the 5.x documentation, but it is not part of the >> >> catalog creation. >> >> >> >> "file_jobid_idx" btree (jobid) >> >> >> >> This index vastly improves the construction of the files, often going >> >> from hours to seconds. I don't recall when that index was added here, but >> >> building trees has never been an issue here. >> > >> > It was removed in this change: >> > >> > commit 740704c9c66d0b049a7cd548ac1204ef1aaf7356 >> > Author: Eric Bollengier >> > Date: Mon May 11 17:11:40 2020 +0200 >> > >> > BEE Backport bacula/src/cats/make_postgresql_tables.in >> > >> > Does PostgreSQL use file_jpfid_idx for the query if you don't have >> > file_jobid_idx? >> >> Testing will show that. I am not at that stage yet. I will be examining the >> queries used and running them through the PostgreSQL 'EXPLAIN ANALYZE' >> process. I'll post results at https://explain.depesz.com so progress can be >> seen and compared. Others will be able to run the same non-destructive >> commands on their own databases for comparison. >> >> I just check my database and it has these row counts: >> >> filename:22,232,549 >> file: 1,208,708,804 >> path: 8,340,411 >> job: 97,139 >> jobmedia: 331,379 >> media: 12,848 > > Is this an old version? The filename table shouldn't exist now. The new > catalog format could make a big difference to the queries (and foreign key > performance). Yes, it's Bacula 9... for shame... -- Dan Langille d...@langille.org ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Adding missing foreign keys to PostgreSQL - avoiding dbcheck
Hello Dan, On 9/30/23 15:54, Dan Langille wrote: Hello, The Bacula PostgreSQL schema is missing several foreign keys (FK). Foreign keys are not a new database concept; they've been around for decades. They are reliable and robust. Yes, it's robust, but Bacula is doing mass insertion, and here, the cost is bigger than the benefits IMHO. Of course, if someone can run tests on a multi-billion record database and show us that the cost is around few %, we can consider them. Basically, each time you insert something, you need to query the other tables to check the existence or not of records. The checks for the existence of the records is already done at the bacula level one time per job, no real need to do it one time per insertion/update/delete. Wednesday, I started a dbcheck on a Bacula database. Granted, that database is 19 years old and this is the first time I've run dbcheck (as far as I know). That dbcheck is still going. FYI, the dump to disk is about 140GB; lots of cruft removal. When PostgreSQL was first added to Bacula, there was resistance to FK, and I did not pursue the issue. Thus, it persists to this day. I hope to change that. I don't think they are necessary, we use transaction in the cleanup now days, so orphan records should not exist anymore. I would like to take that development work back up (pun intended), and start adding foreign keys back into Bacula, at least for PostgresQL. That might remove the need for dbcheck (again, at least for Bacula on PostgreSQL). dbcheck is needed from time to time to purge the Path table mostly, other tables should be ok with a recent version. Not sure the algorithms in dbcheck are always the best ones to find a remove records. For example, one index I have been using this index for years. I find it referenced[1] in the the 5.x documentation, but it is not part of the catalog creation. "file_jobid_idx" btree (jobid) This index vastly improves the construction of the files, often going from hours to seconds. I don't recall when that index was added here, but building trees has never been an issue here. This index is great, but it uses GB of space and it has a cost during the insertion of the attributes. PostgreSQL can use the composed index on JobId, PathId instead to locate records, so you may add it back if you want, but it's not mandatory. I have no timeline for this work, but just posting the intent often gives me the incentive to get started. 1 - https://www.bacula.org/5.2.x-manuals/en/main/main/Catalog_Maintenance.html Best Regards, Eric ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Adding missing foreign keys to PostgreSQL - avoiding dbcheck
> On Tue, 03 Oct 2023 09:35:27 -0400, Dan Langille said: > > On Mon, Oct 2, 2023, at 11:17 AM, Martin Simmons wrote: > >> On Sat, 30 Sep 2023 09:54:51 -0400, Dan Langille said: > >> > >> Hello, > >> > >> The Bacula PostgreSQL schema is missing several foreign keys (FK). Foreign > >> keys are not a new database concept; they've been around for decades. They > >> are reliable and robust. > >> > >> Wednesday, I started a dbcheck on a Bacula database. Granted, that > >> database is 19 years old and this is the first time I've run dbcheck (as > >> far as I know). That dbcheck is still going. FYI, the dump to disk is > >> about 140GB; lots of cruft removal. > >> > >> When PostgreSQL was first added to Bacula, there was resistance to FK, and > >> I did not pursue the issue. Thus, it persists to this day. I hope to > >> change that. > >> > >> I would like to take that development work back up (pun intended), and > >> start adding foreign keys back into Bacula, at least for PostgresQL. That > >> might remove the need for dbcheck (again, at least for Bacula on > >> PostgreSQL). > > > > What is the performance cost of foreign keys? > > I'm replying so it does not appear as if I am ignoring you. Short answer: I > don't know. Yet. That is the purpose of my project. OK, fair enough. > > I can't answer that in a way which would sound satisfying. I have not started > the work. I have only my personal experience - My backups seem fast enough to > me. > > It is easy enough to test. There are several ways to optimize foreign keys > usage. > > >> For example, one index I have been using this index for years. I find it > >> referenced[1] in the the 5.x documentation, but it is not part of the > >> catalog creation. > >> > >> "file_jobid_idx" btree (jobid) > >> > >> This index vastly improves the construction of the files, often going from > >> hours to seconds. I don't recall when that index was added here, but > >> building trees has never been an issue here. > > > > It was removed in this change: > > > > commit 740704c9c66d0b049a7cd548ac1204ef1aaf7356 > > Author: Eric Bollengier > > Date: Mon May 11 17:11:40 2020 +0200 > > > > BEE Backport bacula/src/cats/make_postgresql_tables.in > > > > Does PostgreSQL use file_jpfid_idx for the query if you don't have > > file_jobid_idx? > > Testing will show that. I am not at that stage yet. I will be examining the > queries used and running them through the PostgreSQL 'EXPLAIN ANALYZE' > process. I'll post results at https://explain.depesz.com so progress can be > seen and compared. Others will be able to run the same non-destructive > commands on their own databases for comparison. > > I just check my database and it has these row counts: > > filename:22,232,549 > file: 1,208,708,804 > path: 8,340,411 > job: 97,139 > jobmedia: 331,379 > media: 12,848 Is this an old version? The filename table shouldn't exist now. The new catalog format could make a big difference to the queries (and foreign key performance). __Martin ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Adding missing foreign keys to PostgreSQL - avoiding dbcheck
On Mon, Oct 2, 2023, at 11:17 AM, Martin Simmons wrote: >> On Sat, 30 Sep 2023 09:54:51 -0400, Dan Langille said: >> >> Hello, >> >> The Bacula PostgreSQL schema is missing several foreign keys (FK). Foreign >> keys are not a new database concept; they've been around for decades. They >> are reliable and robust. >> >> Wednesday, I started a dbcheck on a Bacula database. Granted, that database >> is 19 years old and this is the first time I've run dbcheck (as far as I >> know). That dbcheck is still going. FYI, the dump to disk is about 140GB; >> lots of cruft removal. >> >> When PostgreSQL was first added to Bacula, there was resistance to FK, and I >> did not pursue the issue. Thus, it persists to this day. I hope to change >> that. >> >> I would like to take that development work back up (pun intended), and start >> adding foreign keys back into Bacula, at least for PostgresQL. That might >> remove the need for dbcheck (again, at least for Bacula on PostgreSQL). > > What is the performance cost of foreign keys? I'm replying so it does not appear as if I am ignoring you. Short answer: I don't know. Yet. That is the purpose of my project. I can't answer that in a way which would sound satisfying. I have not started the work. I have only my personal experience - My backups seem fast enough to me. It is easy enough to test. There are several ways to optimize foreign keys usage. >> For example, one index I have been using this index for years. I find it >> referenced[1] in the the 5.x documentation, but it is not part of the >> catalog creation. >> >> "file_jobid_idx" btree (jobid) >> >> This index vastly improves the construction of the files, often going from >> hours to seconds. I don't recall when that index was added here, but >> building trees has never been an issue here. > > It was removed in this change: > > commit 740704c9c66d0b049a7cd548ac1204ef1aaf7356 > Author: Eric Bollengier > Date: Mon May 11 17:11:40 2020 +0200 > > BEE Backport bacula/src/cats/make_postgresql_tables.in > > Does PostgreSQL use file_jpfid_idx for the query if you don't have > file_jobid_idx? Testing will show that. I am not at that stage yet. I will be examining the queries used and running them through the PostgreSQL 'EXPLAIN ANALYZE' process. I'll post results at https://explain.depesz.com so progress can be seen and compared. Others will be able to run the same non-destructive commands on their own databases for comparison. I just check my database and it has these row counts: filename:22,232,549 file: 1,208,708,804 path: 8,340,411 job: 97,139 jobmedia: 331,379 media: 12,848 If someone feels this database is not representative of their use case, as mentioned above, they will be able to run the query tests on their own databases and provide feedback. Hope this helps. -- Dan Langille d...@langille.org ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Adding missing foreign keys to PostgreSQL - avoiding dbcheck
> On Sat, 30 Sep 2023 09:54:51 -0400, Dan Langille said: > > Hello, > > The Bacula PostgreSQL schema is missing several foreign keys (FK). Foreign > keys are not a new database concept; they've been around for decades. They > are reliable and robust. > > Wednesday, I started a dbcheck on a Bacula database. Granted, that database > is 19 years old and this is the first time I've run dbcheck (as far as I > know). That dbcheck is still going. FYI, the dump to disk is about 140GB; > lots of cruft removal. > > When PostgreSQL was first added to Bacula, there was resistance to FK, and I > did not pursue the issue. Thus, it persists to this day. I hope to change > that. > > I would like to take that development work back up (pun intended), and start > adding foreign keys back into Bacula, at least for PostgresQL. That might > remove the need for dbcheck (again, at least for Bacula on PostgreSQL). What is the performance cost of foreign keys? > For example, one index I have been using this index for years. I find it > referenced[1] in the the 5.x documentation, but it is not part of the catalog > creation. > > "file_jobid_idx" btree (jobid) > > This index vastly improves the construction of the files, often going from > hours to seconds. I don't recall when that index was added here, but building > trees has never been an issue here. It was removed in this change: commit 740704c9c66d0b049a7cd548ac1204ef1aaf7356 Author: Eric Bollengier Date: Mon May 11 17:11:40 2020 +0200 BEE Backport bacula/src/cats/make_postgresql_tables.in Does PostgreSQL use file_jpfid_idx for the query if you don't have file_jobid_idx? __Martin ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Adding missing foreign keys to PostgreSQL - avoiding dbcheck
Mandi! Dan Langille In chel di` si favelave... > I have no timeline for this work, but just posting the intent often gives me > the incentive to get started. Thanks for the effort. I can confirm that i've also hit DB performance trouble in postgres, so solving in the 'correct DB way' would be wonderful! -- Against software patents: http://www.freepatents.org - http://no-patents.prosa.it ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] Adding missing foreign keys to PostgreSQL - avoiding dbcheck
Hello, The Bacula PostgreSQL schema is missing several foreign keys (FK). Foreign keys are not a new database concept; they've been around for decades. They are reliable and robust. Wednesday, I started a dbcheck on a Bacula database. Granted, that database is 19 years old and this is the first time I've run dbcheck (as far as I know). That dbcheck is still going. FYI, the dump to disk is about 140GB; lots of cruft removal. When PostgreSQL was first added to Bacula, there was resistance to FK, and I did not pursue the issue. Thus, it persists to this day. I hope to change that. I would like to take that development work back up (pun intended), and start adding foreign keys back into Bacula, at least for PostgresQL. That might remove the need for dbcheck (again, at least for Bacula on PostgreSQL). For example, one index I have been using this index for years. I find it referenced[1] in the the 5.x documentation, but it is not part of the catalog creation. "file_jobid_idx" btree (jobid) This index vastly improves the construction of the files, often going from hours to seconds. I don't recall when that index was added here, but building trees has never been an issue here. I have no timeline for this work, but just posting the intent often gives me the incentive to get started. 1 - https://www.bacula.org/5.2.x-manuals/en/main/main/Catalog_Maintenance.html -- Dan Langille d...@langille.org ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users