Re: Database referral integrity
The database migration creating the FK will/would need to have something that either creates dummy missing PKs first, or delete the orphaned keys to insure the operation of creating the FK doesn't error out. Seems like adding dummy keys is a better approach. Then you'll have to make sure that everywhere where FKs are created that there are no edge cases of missing PKs. Then some delete operations in some cases may have to "cascade" properly. The Django Admin had this nice confirm screen on delete that would show you clearly the scope of the cascading operation when deleting objects. To my knowledge Flask-Admin and FAB don't have such a feature. Personally I wouldn't allow cascade unless such a feature is implemented in some way. Note that SQLAlchemy has builtin semantics for specifying how/whether cascading should take place. Personally I think referential integrity is overrated in some cases, especially when using meaningful "business keys" (as opposed to auto-increted "surrogate" keys) as PKs. It also slows down insert operations. For data warehousing (this clearly doesn't apply to the Airflow metadata store), the best practice on most db engine is to **not** enforce FK constraints as it slows down inserts in fact tables and straight out prevents bulk loading. Another approach is to avoid deleting in general, especially referenced fks, and setting up some activity/visibility flag to false instead. Max On Tue, Sep 18, 2018 at 10:47 AM Driesprong, Fokko wrote: > I'm in favor of having referential integrity. It will add some load in > having to enforce the referential integrity, but it will also make sure > that the database stays clean. Also in Airflow we use transactions which > will make sure that the integrity checks are not validated on every > statement, but after the commit. I'm happy to help with this as well. > > Cheers, Fokko > > Op di 18 sep. 2018 om 11:07 schreef Bolke de Bruin : > > > Adding these kind of checks which work for integrity well make database > > access pretty slow. In addition it isnt there because in the past there > was > > no strong connection between for example tasks and dagruns, it was more > or > > less just coincidental. There also so some bisecting tools that probably > > have difficulty functioning in a new regime. In other words it is not an > > easy change and it will have operational challenges. > > > > > On 18 Sep 2018, at 11:03, Ash Berlin-Taylor wrote: > > > > > > Ooh good spot. > > > > > > Yes I would be in favour of adding these, but as you say we need to > > thing about how we might migrate old data. > > > > > > Doing this at 2.0.0 and providing a cleanup script (or doing it as part > > of the migration?) is probably the way to go. > > > > > > -ash- > > > > > >> On 17 Sep 2018, at 19:56, Stefan Seelmann > > wrote: > > >> > > >> Hi, > > >> > > >> looking into the DB schema there is almost no referral integrity > > >> enforced at the database level. Many foreign key constraints between > > >> dag, dag_run, task_instance, xcom, dag_pickle, log, etc would make > sense > > >> IMO. > > >> > > >> Is there a particular reason why that's not implemented? > > >> > > >> Introducing it now will be hard, probably any real-world setup has > some > > >> violations. But I'm still in favor of this additional safety net. > > >> > > >> Kind Regards, > > >> Stefan > > > > > > > >
Re: Database referral integrity
I'm in favor of having referential integrity. It will add some load in having to enforce the referential integrity, but it will also make sure that the database stays clean. Also in Airflow we use transactions which will make sure that the integrity checks are not validated on every statement, but after the commit. I'm happy to help with this as well. Cheers, Fokko Op di 18 sep. 2018 om 11:07 schreef Bolke de Bruin : > Adding these kind of checks which work for integrity well make database > access pretty slow. In addition it isnt there because in the past there was > no strong connection between for example tasks and dagruns, it was more or > less just coincidental. There also so some bisecting tools that probably > have difficulty functioning in a new regime. In other words it is not an > easy change and it will have operational challenges. > > > On 18 Sep 2018, at 11:03, Ash Berlin-Taylor wrote: > > > > Ooh good spot. > > > > Yes I would be in favour of adding these, but as you say we need to > thing about how we might migrate old data. > > > > Doing this at 2.0.0 and providing a cleanup script (or doing it as part > of the migration?) is probably the way to go. > > > > -ash- > > > >> On 17 Sep 2018, at 19:56, Stefan Seelmann > wrote: > >> > >> Hi, > >> > >> looking into the DB schema there is almost no referral integrity > >> enforced at the database level. Many foreign key constraints between > >> dag, dag_run, task_instance, xcom, dag_pickle, log, etc would make sense > >> IMO. > >> > >> Is there a particular reason why that's not implemented? > >> > >> Introducing it now will be hard, probably any real-world setup has some > >> violations. But I'm still in favor of this additional safety net. > >> > >> Kind Regards, > >> Stefan > > > >
Re: Database referral integrity
Adding these kind of checks which work for integrity well make database access pretty slow. In addition it isnt there because in the past there was no strong connection between for example tasks and dagruns, it was more or less just coincidental. There also so some bisecting tools that probably have difficulty functioning in a new regime. In other words it is not an easy change and it will have operational challenges. > On 18 Sep 2018, at 11:03, Ash Berlin-Taylor wrote: > > Ooh good spot. > > Yes I would be in favour of adding these, but as you say we need to thing > about how we might migrate old data. > > Doing this at 2.0.0 and providing a cleanup script (or doing it as part of > the migration?) is probably the way to go. > > -ash- > >> On 17 Sep 2018, at 19:56, Stefan Seelmann wrote: >> >> Hi, >> >> looking into the DB schema there is almost no referral integrity >> enforced at the database level. Many foreign key constraints between >> dag, dag_run, task_instance, xcom, dag_pickle, log, etc would make sense >> IMO. >> >> Is there a particular reason why that's not implemented? >> >> Introducing it now will be hard, probably any real-world setup has some >> violations. But I'm still in favor of this additional safety net. >> >> Kind Regards, >> Stefan >
Re: Database referral integrity
Yes, and data that is in violation of the logical FKs can probably simply be deleted, since they must be orphaned etc anyway... Get Outlook for Android<https://aka.ms/ghei36> From: Ash Berlin-Taylor Sent: Tuesday, September 18, 2018 2:03:50 AM To: dev@airflow.incubator.apache.org Subject: Re: Database referral integrity Ooh good spot. Yes I would be in favour of adding these, but as you say we need to thing about how we might migrate old data. Doing this at 2.0.0 and providing a cleanup script (or doing it as part of the migration?) is probably the way to go. -ash- > On 17 Sep 2018, at 19:56, Stefan Seelmann wrote: > > Hi, > > looking into the DB schema there is almost no referral integrity > enforced at the database level. Many foreign key constraints between > dag, dag_run, task_instance, xcom, dag_pickle, log, etc would make sense > IMO. > > Is there a particular reason why that's not implemented? > > Introducing it now will be hard, probably any real-world setup has some > violations. But I'm still in favor of this additional safety net. > > Kind Regards, > Stefan
Re: Database referral integrity
Ooh good spot. Yes I would be in favour of adding these, but as you say we need to thing about how we might migrate old data. Doing this at 2.0.0 and providing a cleanup script (or doing it as part of the migration?) is probably the way to go. -ash- > On 17 Sep 2018, at 19:56, Stefan Seelmann wrote: > > Hi, > > looking into the DB schema there is almost no referral integrity > enforced at the database level. Many foreign key constraints between > dag, dag_run, task_instance, xcom, dag_pickle, log, etc would make sense > IMO. > > Is there a particular reason why that's not implemented? > > Introducing it now will be hard, probably any real-world setup has some > violations. But I'm still in favor of this additional safety net. > > Kind Regards, > Stefan
Database referral integrity
Hi, looking into the DB schema there is almost no referral integrity enforced at the database level. Many foreign key constraints between dag, dag_run, task_instance, xcom, dag_pickle, log, etc would make sense IMO. Is there a particular reason why that's not implemented? Introducing it now will be hard, probably any real-world setup has some violations. But I'm still in favor of this additional safety net. Kind Regards, Stefan