Re: [Bacula-users] Adding missing foreign keys to PostgreSQL - avoiding dbcheck

2023-10-04 Thread Eric Bollengier via Bacula-users

Hello Dan,

On 10/3/23 21:01, Dan Langille wrote:

On Tue, Oct 3, 2023, at 1:45 PM, Eric Bollengier via Bacula-users wrote:

Hello Dan,

On 9/30/23 15:54, Dan Langille wrote:
Basically, each time you insert something, you need to query the other tables to
check the existence or not of records. The checks for the existence of the
records is already done at the bacula level one time per job, no real need to do
it one time per insertion/update/delete.


Is that one-time-per-job check done via a SELECT on the database? I should 
explain analyze that one too and see if we can optimize that.


Look the code, we get/create the fileset record, pool record, client, device,
... records when we start a job or load the configuration.

For Path, the query itself ensures that we have the proper records, it's
builtin.


For example, one index I have been using this index for years. I find it 
referenced[1] in the the 5.x documentation, but it is not part of the catalog 
creation.

  "file_jobid_idx" btree (jobid)

This index vastly improves the construction of the files, often going from 
hours to seconds. I don't recall when that index was added here, but building 
trees has never been an issue here.


This index is great, but it uses GB of space and it has a cost during the
insertion of the attributes. PostgreSQL can use the composed index on JobId,
PathId instead to locate records, so you may add it back if you want, but it's
not mandatory.


How many GB? Do we know? We can easily find out too.

bacula=# select pg_table_size('file_jobid_idx');
  pg_table_size
---
37937438720
(1 row)

That's about 40GB.  I'll check the explain analyse and see how things go and 
come up with cost. Right now, we have no data.

Some might say: if the cost > 0 it's not worth it. However, it is worth it, 
IMO. Databases are much better at enforcing relational integrity than code is. 
It's what they do.


When Marc Cousin and I have worked on the Batch insert procedures, we have
tested all these scenario including the FKEY, it was back in 2006 maybe.
PostgreSQL has done a lot of progress since and SSD are replacing spinning disks
for example, so it might be different, hard to say without a deep analysis with
a large data set.

Our complete analysis and wide range of tests that we did for a couple of months
ended with the code we have today.

Best Regards,
Eric


___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Adding missing foreign keys to PostgreSQL - avoiding dbcheck

2023-10-03 Thread Dan Langille
On Tue, Oct 3, 2023, at 1:45 PM, Eric Bollengier via Bacula-users wrote:
> Hello Dan,
>
> On 9/30/23 15:54, Dan Langille wrote:
>> Hello,
>> 
>> The Bacula PostgreSQL schema is missing several foreign keys (FK). Foreign 
>> keys are not a new database concept; they've been around for decades. They 
>> are reliable and robust.
>
> Yes, it's robust, but Bacula is doing mass insertion, and here, the cost is
> bigger than the benefits IMHO. Of course, if someone can run tests on a 
> multi-billion record database and show us that the cost is around few %, we 
> can
> consider them.

Here's hoping someone with that database can step up.

> Basically, each time you insert something, you need to query the other tables 
> to
> check the existence or not of records. The checks for the existence of the
> records is already done at the bacula level one time per job, no real need to 
> do
> it one time per insertion/update/delete.

Is that one-time-per-job check done via a SELECT on the database? I should 
explain analyze that one too and see if we can optimize that.

>> Wednesday, I started a dbcheck on a Bacula database. Granted, that database 
>> is 19 years old and this is the first time I've run dbcheck (as far as I 
>> know). That dbcheck is still going. FYI, the dump to disk is about 140GB; 
>> lots of cruft removal.
>> 
>> When PostgreSQL was first added to Bacula, there was resistance to FK, and I 
>> did not pursue the issue. Thus, it persists to this day. I hope to change 
>> that.
>
> I don't think they are necessary, we use transaction in the cleanup now days, 
> so
> orphan records should not exist anymore.
>
>> I would like to take that development work back up (pun intended), and start 
>> adding foreign keys back into Bacula, at least for PostgresQL. That might 
>> remove the need for dbcheck (again, at least for Bacula on PostgreSQL).
>
> dbcheck is needed from time to time to purge the Path table mostly, other 
> tables
> should be ok with a recent version. Not sure the algorithms in dbcheck are
> always the best ones to find a remove records.
>
>> For example, one index I have been using this index for years. I find it 
>> referenced[1] in the the 5.x documentation, but it is not part of the 
>> catalog creation.
>> 
>>  "file_jobid_idx" btree (jobid)
>> 
>> This index vastly improves the construction of the files, often going from 
>> hours to seconds. I don't recall when that index was added here, but 
>> building trees has never been an issue here.
>
> This index is great, but it uses GB of space and it has a cost during the 
> insertion of the attributes. PostgreSQL can use the composed index on JobId,
> PathId instead to locate records, so you may add it back if you want, but it's
> not mandatory.

How many GB? Do we know? We can easily find out too. 

bacula=# select pg_table_size('file_jobid_idx');
 pg_table_size 
---
   37937438720
(1 row)

That's about 40GB.  I'll check the explain analyse and see how things go and 
come up with cost. Right now, we have no data.

Some might say: if the cost > 0 it's not worth it. However, it is worth it, 
IMO. Databases are much better at enforcing relational integrity than code is. 
It's what they do.

Thank you

-- 
  Dan Langille
  d...@langille.org


___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Adding missing foreign keys to PostgreSQL - avoiding dbcheck

2023-10-03 Thread Dan Langille



On Tue, Oct 3, 2023, at 1:31 PM, Martin Simmons wrote:
>> On Tue, 03 Oct 2023 09:35:27 -0400, Dan Langille said:
>> 
>> On Mon, Oct 2, 2023, at 11:17 AM, Martin Simmons wrote:
>> >> On Sat, 30 Sep 2023 09:54:51 -0400, Dan Langille said:
>> >> 
>> >> Hello,
>> >> 
>> >> The Bacula PostgreSQL schema is missing several foreign keys (FK). 
>> >> Foreign keys are not a new database concept; they've been around for 
>> >> decades. They are reliable and robust.
>> >> 
>> >> Wednesday, I started a dbcheck on a Bacula database. Granted, that 
>> >> database is 19 years old and this is the first time I've run dbcheck (as 
>> >> far as I know). That dbcheck is still going. FYI, the dump to disk is 
>> >> about 140GB; lots of cruft removal. 
>> >> 
>> >> When PostgreSQL was first added to Bacula, there was resistance to FK, 
>> >> and I did not pursue the issue. Thus, it persists to this day. I hope to 
>> >> change that.
>> >> 
>> >> I would like to take that development work back up (pun intended), and 
>> >> start adding foreign keys back into Bacula, at least for PostgresQL. That 
>> >> might remove the need for dbcheck (again, at least for Bacula on 
>> >> PostgreSQL).
>> >
>> > What is the performance cost of foreign keys?
>> 
>> I'm replying so it does not appear as if I am ignoring you. Short answer: I 
>> don't know. Yet. That is the purpose of my project.
>
> OK, fair enough.
>
>> 
>> I can't answer that in a way which would sound satisfying. I have not 
>> started the work. I have only my personal experience - My backups seem fast 
>> enough to me.
>> 
>> It is easy enough to test. There are several ways to optimize foreign keys 
>> usage.
>> 
>> >> For example, one index I have been using this index for years. I find it 
>> >> referenced[1] in the the 5.x documentation, but it is not part of the 
>> >> catalog creation.
>> >> 
>> >> "file_jobid_idx" btree (jobid)
>> >> 
>> >> This index vastly improves the construction of the files, often going 
>> >> from hours to seconds. I don't recall when that index was added here, but 
>> >> building trees has never been an issue here.
>> >
>> > It was removed in this change:
>> >
>> > commit 740704c9c66d0b049a7cd548ac1204ef1aaf7356
>> > Author: Eric Bollengier 
>> > Date:   Mon May 11 17:11:40 2020 +0200
>> >
>> > BEE Backport bacula/src/cats/make_postgresql_tables.in
>> > 
>> > Does PostgreSQL use file_jpfid_idx for the query if you don't have
>> > file_jobid_idx?
>> 
>> Testing will show that. I am not at that stage yet. I will be examining the 
>> queries used and running them through the PostgreSQL 'EXPLAIN ANALYZE' 
>> process. I'll post results at https://explain.depesz.com so progress can be 
>> seen and compared. Others will be able to run the same non-destructive 
>> commands on their own databases for comparison.
>> 
>> I just check my database and it has these row counts:
>> 
>> filename:22,232,549
>> file: 1,208,708,804
>> path: 8,340,411
>> job: 97,139
>> jobmedia:   331,379
>> media:   12,848
>
> Is this an old version?  The filename table shouldn't exist now.  The new
> catalog format could make a big difference to the queries (and foreign key
> performance).

Yes, it's Bacula 9... for shame...
-- 
  Dan Langille
  d...@langille.org


___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Adding missing foreign keys to PostgreSQL - avoiding dbcheck

2023-10-03 Thread Eric Bollengier via Bacula-users

Hello Dan,

On 9/30/23 15:54, Dan Langille wrote:

Hello,

The Bacula PostgreSQL schema is missing several foreign keys (FK). Foreign keys 
are not a new database concept; they've been around for decades. They are 
reliable and robust.


Yes, it's robust, but Bacula is doing mass insertion, and here, the cost is
bigger than the benefits IMHO. Of course, if someone can run tests on a 
multi-billion record database and show us that the cost is around few %, we can

consider them.

Basically, each time you insert something, you need to query the other tables to
check the existence or not of records. The checks for the existence of the
records is already done at the bacula level one time per job, no real need to do
it one time per insertion/update/delete.


Wednesday, I started a dbcheck on a Bacula database. Granted, that database is 
19 years old and this is the first time I've run dbcheck (as far as I know). 
That dbcheck is still going. FYI, the dump to disk is about 140GB; lots of 
cruft removal.

When PostgreSQL was first added to Bacula, there was resistance to FK, and I 
did not pursue the issue. Thus, it persists to this day. I hope to change that.


I don't think they are necessary, we use transaction in the cleanup now days, so
orphan records should not exist anymore.


I would like to take that development work back up (pun intended), and start 
adding foreign keys back into Bacula, at least for PostgresQL. That might 
remove the need for dbcheck (again, at least for Bacula on PostgreSQL).


dbcheck is needed from time to time to purge the Path table mostly, other tables
should be ok with a recent version. Not sure the algorithms in dbcheck are
always the best ones to find a remove records.


For example, one index I have been using this index for years. I find it 
referenced[1] in the the 5.x documentation, but it is not part of the catalog 
creation.

 "file_jobid_idx" btree (jobid)

This index vastly improves the construction of the files, often going from 
hours to seconds. I don't recall when that index was added here, but building 
trees has never been an issue here.


This index is great, but it uses GB of space and it has a cost during the 
insertion of the attributes. PostgreSQL can use the composed index on JobId,

PathId instead to locate records, so you may add it back if you want, but it's
not mandatory.


I have no timeline for this work, but just posting the intent often gives me 
the incentive to get started.



1 - https://www.bacula.org/5.2.x-manuals/en/main/main/Catalog_Maintenance.html



Best Regards,
Eric


___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Adding missing foreign keys to PostgreSQL - avoiding dbcheck

2023-10-03 Thread Martin Simmons
> On Tue, 03 Oct 2023 09:35:27 -0400, Dan Langille said:
> 
> On Mon, Oct 2, 2023, at 11:17 AM, Martin Simmons wrote:
> >> On Sat, 30 Sep 2023 09:54:51 -0400, Dan Langille said:
> >> 
> >> Hello,
> >> 
> >> The Bacula PostgreSQL schema is missing several foreign keys (FK). Foreign 
> >> keys are not a new database concept; they've been around for decades. They 
> >> are reliable and robust.
> >> 
> >> Wednesday, I started a dbcheck on a Bacula database. Granted, that 
> >> database is 19 years old and this is the first time I've run dbcheck (as 
> >> far as I know). That dbcheck is still going. FYI, the dump to disk is 
> >> about 140GB; lots of cruft removal. 
> >> 
> >> When PostgreSQL was first added to Bacula, there was resistance to FK, and 
> >> I did not pursue the issue. Thus, it persists to this day. I hope to 
> >> change that.
> >> 
> >> I would like to take that development work back up (pun intended), and 
> >> start adding foreign keys back into Bacula, at least for PostgresQL. That 
> >> might remove the need for dbcheck (again, at least for Bacula on 
> >> PostgreSQL).
> >
> > What is the performance cost of foreign keys?
> 
> I'm replying so it does not appear as if I am ignoring you. Short answer: I 
> don't know. Yet. That is the purpose of my project.

OK, fair enough.

> 
> I can't answer that in a way which would sound satisfying. I have not started 
> the work. I have only my personal experience - My backups seem fast enough to 
> me.
> 
> It is easy enough to test. There are several ways to optimize foreign keys 
> usage.
> 
> >> For example, one index I have been using this index for years. I find it 
> >> referenced[1] in the the 5.x documentation, but it is not part of the 
> >> catalog creation.
> >> 
> >> "file_jobid_idx" btree (jobid)
> >> 
> >> This index vastly improves the construction of the files, often going from 
> >> hours to seconds. I don't recall when that index was added here, but 
> >> building trees has never been an issue here.
> >
> > It was removed in this change:
> >
> > commit 740704c9c66d0b049a7cd548ac1204ef1aaf7356
> > Author: Eric Bollengier 
> > Date:   Mon May 11 17:11:40 2020 +0200
> >
> > BEE Backport bacula/src/cats/make_postgresql_tables.in
> > 
> > Does PostgreSQL use file_jpfid_idx for the query if you don't have
> > file_jobid_idx?
> 
> Testing will show that. I am not at that stage yet. I will be examining the 
> queries used and running them through the PostgreSQL 'EXPLAIN ANALYZE' 
> process. I'll post results at https://explain.depesz.com so progress can be 
> seen and compared. Others will be able to run the same non-destructive 
> commands on their own databases for comparison.
> 
> I just check my database and it has these row counts:
> 
> filename:22,232,549
> file: 1,208,708,804
> path: 8,340,411
> job: 97,139
> jobmedia:   331,379
> media:   12,848

Is this an old version?  The filename table shouldn't exist now.  The new
catalog format could make a big difference to the queries (and foreign key
performance).

__Martin


___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Adding missing foreign keys to PostgreSQL - avoiding dbcheck

2023-10-03 Thread Dan Langille
On Mon, Oct 2, 2023, at 11:17 AM, Martin Simmons wrote:
>> On Sat, 30 Sep 2023 09:54:51 -0400, Dan Langille said:
>> 
>> Hello,
>> 
>> The Bacula PostgreSQL schema is missing several foreign keys (FK). Foreign 
>> keys are not a new database concept; they've been around for decades. They 
>> are reliable and robust.
>> 
>> Wednesday, I started a dbcheck on a Bacula database. Granted, that database 
>> is 19 years old and this is the first time I've run dbcheck (as far as I 
>> know). That dbcheck is still going. FYI, the dump to disk is about 140GB; 
>> lots of cruft removal. 
>> 
>> When PostgreSQL was first added to Bacula, there was resistance to FK, and I 
>> did not pursue the issue. Thus, it persists to this day. I hope to change 
>> that.
>> 
>> I would like to take that development work back up (pun intended), and start 
>> adding foreign keys back into Bacula, at least for PostgresQL. That might 
>> remove the need for dbcheck (again, at least for Bacula on PostgreSQL).
>
> What is the performance cost of foreign keys?

I'm replying so it does not appear as if I am ignoring you. Short answer: I 
don't know. Yet. That is the purpose of my project.

I can't answer that in a way which would sound satisfying. I have not started 
the work. I have only my personal experience - My backups seem fast enough to 
me.

It is easy enough to test. There are several ways to optimize foreign keys 
usage.

>> For example, one index I have been using this index for years. I find it 
>> referenced[1] in the the 5.x documentation, but it is not part of the 
>> catalog creation.
>> 
>> "file_jobid_idx" btree (jobid)
>> 
>> This index vastly improves the construction of the files, often going from 
>> hours to seconds. I don't recall when that index was added here, but 
>> building trees has never been an issue here.
>
> It was removed in this change:
>
> commit 740704c9c66d0b049a7cd548ac1204ef1aaf7356
> Author: Eric Bollengier 
> Date:   Mon May 11 17:11:40 2020 +0200
>
> BEE Backport bacula/src/cats/make_postgresql_tables.in
> 
> Does PostgreSQL use file_jpfid_idx for the query if you don't have
> file_jobid_idx?

Testing will show that. I am not at that stage yet. I will be examining the 
queries used and running them through the PostgreSQL 'EXPLAIN ANALYZE' process. 
I'll post results at https://explain.depesz.com so progress can be seen and 
compared. Others will be able to run the same non-destructive commands on their 
own databases for comparison.

I just check my database and it has these row counts:

filename:22,232,549
file: 1,208,708,804
path: 8,340,411
job: 97,139
jobmedia:   331,379
media:   12,848

If someone feels this database is not representative of their use case, as 
mentioned above, they will be able to run the query tests on their own 
databases and provide feedback.

Hope this helps.

-- 
  Dan Langille
  d...@langille.org


___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Adding missing foreign keys to PostgreSQL - avoiding dbcheck

2023-10-02 Thread Martin Simmons
> On Sat, 30 Sep 2023 09:54:51 -0400, Dan Langille said:
> 
> Hello,
> 
> The Bacula PostgreSQL schema is missing several foreign keys (FK). Foreign 
> keys are not a new database concept; they've been around for decades. They 
> are reliable and robust.
> 
> Wednesday, I started a dbcheck on a Bacula database. Granted, that database 
> is 19 years old and this is the first time I've run dbcheck (as far as I 
> know). That dbcheck is still going. FYI, the dump to disk is about 140GB; 
> lots of cruft removal. 
> 
> When PostgreSQL was first added to Bacula, there was resistance to FK, and I 
> did not pursue the issue. Thus, it persists to this day. I hope to change 
> that.
> 
> I would like to take that development work back up (pun intended), and start 
> adding foreign keys back into Bacula, at least for PostgresQL. That might 
> remove the need for dbcheck (again, at least for Bacula on PostgreSQL).

What is the performance cost of foreign keys?


> For example, one index I have been using this index for years. I find it 
> referenced[1] in the the 5.x documentation, but it is not part of the catalog 
> creation.
> 
> "file_jobid_idx" btree (jobid)
> 
> This index vastly improves the construction of the files, often going from 
> hours to seconds. I don't recall when that index was added here, but building 
> trees has never been an issue here.

It was removed in this change:

commit 740704c9c66d0b049a7cd548ac1204ef1aaf7356
Author: Eric Bollengier 
Date:   Mon May 11 17:11:40 2020 +0200

BEE Backport bacula/src/cats/make_postgresql_tables.in
 
Does PostgreSQL use file_jpfid_idx for the query if you don't have
file_jobid_idx?

__Martin


___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Adding missing foreign keys to PostgreSQL - avoiding dbcheck

2023-10-02 Thread Marco Gaiarin
Mandi! Dan Langille
  In chel di` si favelave...

> I have no timeline for this work, but just posting the intent often gives me 
> the incentive to get started.

Thanks for the effort. I can confirm that i've also hit DB performance
trouble in postgres, so solving in the 'correct DB way' would be wonderful!

-- 
  Against software patents:
  http://www.freepatents.org - http://no-patents.prosa.it




___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] Adding missing foreign keys to PostgreSQL - avoiding dbcheck

2023-09-30 Thread Dan Langille
Hello,

The Bacula PostgreSQL schema is missing several foreign keys (FK). Foreign keys 
are not a new database concept; they've been around for decades. They are 
reliable and robust.

Wednesday, I started a dbcheck on a Bacula database. Granted, that database is 
19 years old and this is the first time I've run dbcheck (as far as I know). 
That dbcheck is still going. FYI, the dump to disk is about 140GB; lots of 
cruft removal. 

When PostgreSQL was first added to Bacula, there was resistance to FK, and I 
did not pursue the issue. Thus, it persists to this day. I hope to change that.

I would like to take that development work back up (pun intended), and start 
adding foreign keys back into Bacula, at least for PostgresQL. That might 
remove the need for dbcheck (again, at least for Bacula on PostgreSQL).

For example, one index I have been using this index for years. I find it 
referenced[1] in the the 5.x documentation, but it is not part of the catalog 
creation.

"file_jobid_idx" btree (jobid)

This index vastly improves the construction of the files, often going from 
hours to seconds. I don't recall when that index was added here, but building 
trees has never been an issue here.

I have no timeline for this work, but just posting the intent often gives me 
the incentive to get started.

1 - https://www.bacula.org/5.2.x-manuals/en/main/main/Catalog_Maintenance.html

-- 
  Dan Langille
  d...@langille.org


___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users