[jira] [Commented] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance

2018-08-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573618#comment-16573618
 ] 

ASF subversion and git services commented on AIRFLOW-2870:
--

Commit 95aa49a71dcc69d2e9a8e32b69a2a61cacec2b1b in incubator-airflow's branch 
refs/heads/v1-10-test from bolkedebruin
[ https://gitbox.apache.org/repos/asf?p=incubator-airflow.git;h=95aa49a ]

[AIRFLOW-2870] Use abstract TaskInstance for migration (#3720)

If we use the full model for migration it can have columns
added that are not available yet in the database. Using
an abstraction ensures only the columns that are required
for data migration are present.

(cherry picked from commit 546f1cdb5208ba8e1cf3bde36bbdbb639fa20b22)
Signed-off-by: Bolke de Bruin 


> Migrations fail when upgrading from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance
> 
>
> Key: AIRFLOW-2870
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2870
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: George Leslie-Waksman
>Priority: Blocker
>
> Running migrations from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with:
> {noformat}
> INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
> INFO  [alembic.runtime.migration] Will assume transactional DDL.
> INFO  [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> 
> cc1e65623dc7, add max tries column to task instance
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", 
> line 1182, in _execute_context
> context)
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", 
> line 470, in do_execute
> cursor.execute(statement, parameters)
> psycopg2.ProgrammingError: column task_instance.executor_config does not exist
> LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta...
> {noformat}
> The failure is occurring because 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance 
> from the current code version, which has changes to the task_instance table 
> that are not expected by the migration.
> Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an 
> executor_config column that does not exist as of when 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py is run.
> It is worth noting that this will not be observed for new installs because 
> the migration branches on table existence/non-existence at a point that will 
> hide the issue from new installs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance

2018-08-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573527#comment-16573527
 ] 

ASF GitHub Bot commented on AIRFLOW-2870:
-

bolkedebruin closed pull request #3720: [AIRFLOW-2870] Use abstract 
TaskInstance for migration
URL: https://github.com/apache/incubator-airflow/pull/3720
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/airflow/migrations/versions/27c6a30d7c24_add_executor_config_to_task_instance.py
 
b/airflow/migrations/versions/27c6a30d7c24_add_executor_config_to_task_instance.py
index b7213a3031..27a9f593b5 100644
--- 
a/airflow/migrations/versions/27c6a30d7c24_add_executor_config_to_task_instance.py
+++ 
b/airflow/migrations/versions/27c6a30d7c24_add_executor_config_to_task_instance.py
@@ -1,16 +1,22 @@
 # flake8: noqa
 #
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
 #
-# http://www.apache.org/licenses/LICENSE-2.0
+#   http://www.apache.org/licenses/LICENSE-2.0
 #
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
 
 """kubernetes_resource_checkpointing
 
diff --git 
a/airflow/migrations/versions/33ae817a1ff4_add_kubernetes_resource_checkpointing.py
 
b/airflow/migrations/versions/33ae817a1ff4_add_kubernetes_resource_checkpointing.py
index 4347bae92a..c489c05f7e 100644
--- 
a/airflow/migrations/versions/33ae817a1ff4_add_kubernetes_resource_checkpointing.py
+++ 
b/airflow/migrations/versions/33ae817a1ff4_add_kubernetes_resource_checkpointing.py
@@ -1,16 +1,22 @@
 # flake8: noqa
 #
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
 #
-# http://www.apache.org/licenses/LICENSE-2.0
+#   http://www.apache.org/licenses/LICENSE-2.0
 #
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
 
 """kubernetes_resource_checkpointing
 
diff --git 
a/airflow/migrations/versions/86770d1215c0_add_kubernetes_scheduler_uniqueness.py
 
b/airflow/migrations/versions/86770d1215c0_add_kubernetes_scheduler_uniqueness.py
index 6bc48f1105..5c921c6a98 100644
--- 
a/airflow/migrations/versions/86770d1215c0_add_kubernetes_scheduler_uniqueness.py
+++ 
b/airflow/migrations/versions/86770d1215c0_add_kubernetes_scheduler_uniqueness.py
@@ -1,16 +1,22 @@
 # flake8: noqa
 #
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more 

[jira] [Commented] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance

2018-08-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573528#comment-16573528
 ] 

ASF subversion and git services commented on AIRFLOW-2870:
--

Commit 546f1cdb5208ba8e1cf3bde36bbdbb639fa20b22 in incubator-airflow's branch 
refs/heads/master from bolkedebruin
[ https://gitbox.apache.org/repos/asf?p=incubator-airflow.git;h=546f1cd ]

[AIRFLOW-2870] Use abstract TaskInstance for migration (#3720)

If we use the full model for migration it can have columns
added that are not available yet in the database. Using
an abstraction ensures only the columns that are required
for data migration are present.

> Migrations fail when upgrading from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance
> 
>
> Key: AIRFLOW-2870
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2870
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: George Leslie-Waksman
>Priority: Blocker
>
> Running migrations from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with:
> {noformat}
> INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
> INFO  [alembic.runtime.migration] Will assume transactional DDL.
> INFO  [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> 
> cc1e65623dc7, add max tries column to task instance
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", 
> line 1182, in _execute_context
> context)
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", 
> line 470, in do_execute
> cursor.execute(statement, parameters)
> psycopg2.ProgrammingError: column task_instance.executor_config does not exist
> LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta...
> {noformat}
> The failure is occurring because 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance 
> from the current code version, which has changes to the task_instance table 
> that are not expected by the migration.
> Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an 
> executor_config column that does not exist as of when 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py is run.
> It is worth noting that this will not be observed for new installs because 
> the migration branches on table existence/non-existence at a point that will 
> hide the issue from new installs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance

2018-08-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573041#comment-16573041
 ] 

ASF GitHub Bot commented on AIRFLOW-2870:
-

bolkedebruin opened a new pull request #3720: [AIRFLOW-2870] Use abstract 
TaskInstance for migration
URL: https://github.com/apache/incubator-airflow/pull/3720
 
 
   
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [X] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-2870
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [X] Here are some details about my PR, including screenshots of any UI 
changes:
   
   If we use the full model for migration it can have columns
   added that are not available yet in the database. Using
   an abstraction ensures only the columns that are required
   for data migration are present.
   ### Tests
   
   - [X] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   Db migration
   
   ### Commits
   
   - [X] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   
   @ashb @gwax PTAL
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Migrations fail when upgrading from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance
> 
>
> Key: AIRFLOW-2870
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2870
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: George Leslie-Waksman
>Priority: Blocker
>
> Running migrations from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with:
> {noformat}
> INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
> INFO  [alembic.runtime.migration] Will assume transactional DDL.
> INFO  [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> 
> cc1e65623dc7, add max tries column to task instance
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", 
> line 1182, in _execute_context
> context)
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", 
> line 470, in do_execute
> cursor.execute(statement, parameters)
> psycopg2.ProgrammingError: column task_instance.executor_config does not exist
> LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta...
> {noformat}
> The failure is occurring because 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance 
> from the current code version, which has changes to the task_instance table 
> that are not expected by the migration.
> Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an 
> executor_config column that does not exist as of when 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py is run.
> It is worth noting that this will not be observed for new installs because 
> the migration branches on table existence/non-existence at a point that will 
> hide the issue from new installs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance

2018-08-08 Thread Bolke de Bruin (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572906#comment-16572906
 ] 

Bolke de Bruin commented on AIRFLOW-2870:
-

or use with_entities, trying that

> Migrations fail when upgrading from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance
> 
>
> Key: AIRFLOW-2870
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2870
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: George Leslie-Waksman
>Priority: Blocker
>
> Running migrations from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with:
> {noformat}
> INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
> INFO  [alembic.runtime.migration] Will assume transactional DDL.
> INFO  [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> 
> cc1e65623dc7, add max tries column to task instance
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", 
> line 1182, in _execute_context
> context)
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", 
> line 470, in do_execute
> cursor.execute(statement, parameters)
> psycopg2.ProgrammingError: column task_instance.executor_config does not exist
> LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta...
> {noformat}
> The failure is occurring because 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance 
> from the current code version, which has changes to the task_instance table 
> that are not expected by the migration.
> Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an 
> executor_config column that does not exist as of when 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py is run.
> It is worth noting that this will not be observed for new installs because 
> the migration branches on table existence/non-existence at a point that will 
> hide the issue from new installs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance

2018-08-08 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572883#comment-16572883
 ] 

Ash Berlin-Taylor commented on AIRFLOW-2870:


Direct SQL might be an option, or 
https://stackoverflow.com/questions/24612395/how-do-i-execute-inserts-and-updates-in-an-alembic-upgrade-script
 suggests defining a model in the migration file directly, rather than 
importing one.

> Migrations fail when upgrading from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance
> 
>
> Key: AIRFLOW-2870
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2870
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: George Leslie-Waksman
>Priority: Blocker
>
> Running migrations from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with:
> {noformat}
> INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
> INFO  [alembic.runtime.migration] Will assume transactional DDL.
> INFO  [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> 
> cc1e65623dc7, add max tries column to task instance
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", 
> line 1182, in _execute_context
> context)
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", 
> line 470, in do_execute
> cursor.execute(statement, parameters)
> psycopg2.ProgrammingError: column task_instance.executor_config does not exist
> LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta...
> {noformat}
> The failure is occurring because 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance 
> from the current code version, which has changes to the task_instance table 
> that are not expected by the migration.
> Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an 
> executor_config column that does not exist as of when 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py is run.
> It is worth noting that this will not be observed for new installs because 
> the migration branches on table existence/non-existence at a point that will 
> hide the issue from new installs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance

2018-08-08 Thread Bolke de Bruin (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572876#comment-16572876
 ] 

Bolke de Bruin commented on AIRFLOW-2870:
-

Gotcha. The weakness of using orm in alembic. Column loading might be an option 
as we do not need the full model. Or instead of using the database as a 
reference use the dagbag as a reference and update by using direct sql.

> Migrations fail when upgrading from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance
> 
>
> Key: AIRFLOW-2870
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2870
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: George Leslie-Waksman
>Priority: Blocker
>
> Running migrations from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with:
> {noformat}
> INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
> INFO  [alembic.runtime.migration] Will assume transactional DDL.
> INFO  [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> 
> cc1e65623dc7, add max tries column to task instance
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", 
> line 1182, in _execute_context
> context)
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", 
> line 470, in do_execute
> cursor.execute(statement, parameters)
> psycopg2.ProgrammingError: column task_instance.executor_config does not exist
> LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta...
> {noformat}
> The failure is occurring because 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance 
> from the current code version, which has changes to the task_instance table 
> that are not expected by the migration.
> Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an 
> executor_config column that does not exist as of when 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py is run.
> It is worth noting that this will not be observed for new installs because 
> the migration branches on table existence/non-existence at a point that will 
> hide the issue from new installs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance

2018-08-08 Thread George Leslie-Waksman (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572857#comment-16572857
 ] 

George Leslie-Waksman commented on AIRFLOW-2870:


If we instead upgrade from 1.8.1 -> 1.9.0 -> 1.10rc3, we do not run into a 
problem because 1.9.0 exists between 
{{cc1e65623dc7_add_max_tries_column_to_task_instance.py}} and 
{{27c6a30d7c24_add_executor_config_to_task_instance.py}}

{noformat}
cd temp
pyenv virtualenv 2.7.15 temp
pyenv local temp
pip install pip==9.0.1
pip install apache-airflow==1.8.1
AIRFLOW_HOME=. airflow initdb
AIRFLOW_HOME=. airflow backfill -s 2018-01-01 -e 2018-01-02 
example_bash_operator
pip install apache-airflow==1.9.0
AIRFLOW_HOME=. airflow upgradedb
SLUGIFY_USES_TEXT_UNIDECODE=yes pip install 
https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-source.tar.gz
AIRFLOW_HOME=. airflow upgradedb
{noformat}

This is a fine workaround but, absent a warning notice or a great deal of 
digging into code and understanding how migrations, there is no way for a user 
to know that it is not possible to upgrade Airflow from <1.9.0 to >=1.10.0.

Furthermore, failing to upgrade and then trying to go through the intermediary 
version will leave the database in an inconsistent state that requires manual 
database intervention to repair:

{noformat}
cd temp
pyenv virtualenv 2.7.15 temp
pyenv local temp
pip install pip==9.0.1
pip install apache-airflow==1.8.1
AIRFLOW_HOME=. airflow initdb
AIRFLOW_HOME=. airflow backfill -s 2018-01-01 -e 2018-01-02 
example_bash_operator
SLUGIFY_USES_TEXT_UNIDECODE=yes pip install 
https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-source.tar.gz
AIRFLOW_HOME=. airflow upgradedb
pip install apache-airflow==1.9.0
AIRFLOW_HOME=. airflow upgradedb
{noformat}

failure on 1.9.0 upgrade attempt:

{noformat}
[2018-08-08 01:04:53,318] {__init__.py:45} INFO - Using executor 
SequentialExecutor
DB: sqlite:///./airflow.db
[2018-08-08 01:04:53,450] {db.py:312} INFO - Creating tables
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> cc1e65623dc7, 
add max tries column to task instance
Traceback (most recent call last):
  File "/Users/georgelesliewaksman/.pyenv/versions/temp2/bin/airflow", line 27, 
in 
args.func(args)
  File 
"/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/airflow/bin/cli.py",
 line 913, in upgradedb
db_utils.upgradedb()
  File 
"/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/airflow/utils/db.py",
 line 320, in upgradedb
command.upgrade(config, 'heads')
  File 
"/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/alembic/command.py",
 line 174, in upgrade
script.run_env()
  File 
"/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/alembic/script/base.py",
 line 416, in run_env
util.load_python_file(self.dir, 'env.py')
  File 
"/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/alembic/util/pyfiles.py",
 line 93, in load_python_file
module = load_module_py(module_id, path)
  File 
"/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/alembic/util/compat.py",
 line 79, in load_module_py
mod = imp.load_source(module_id, path, fp)
  File 
"/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/airflow/migrations/env.py",
 line 86, in 
run_migrations_online()
  File 
"/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/airflow/migrations/env.py",
 line 81, in run_migrations_online
context.run_migrations()
  File "", line 8, in run_migrations
  File 
"/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/alembic/runtime/environment.py",
 line 807, in run_migrations
self.get_context().run_migrations(**kw)
  File 
"/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/alembic/runtime/migration.py",
 line 321, in run_migrations
step.migration_fn(**kw)
  File 
"/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py",
 line 39, in upgrade
server_default="-1"))
  File "", line 8, in add_column
  File "", line 3, in add_column
  File 
"/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/alembic/operations/ops.py",
 line 1541, in add_column
return operations.invoke(op)
  File 

[jira] [Commented] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance

2018-08-08 Thread George Leslie-Waksman (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572834#comment-16572834
 ] 

George Leslie-Waksman commented on AIRFLOW-2870:


Exact steps to reproduce:

{noformat}
cd temp
pyenv virtualenv 2.7.15 temp
pyenv local temp
pip install pip==9.0.1
pip install apache-airflow==1.8.1
AIRFLOW_HOME=. airflow initdb
AIRFLOW_HOME=. airflow backfill -s 2018-01-01 -e 2018-01-02 
example_bash_operator
SLUGIFY_USES_TEXT_UNIDECODE=yes pip install 
https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-source.tar.gz
AIRFLOW_HOME=. airflow upgradedb
{noformat}

This results in the following error output:

{noformat}
[2018-08-08 00:51:32,656] {__init__.py:51} INFO - Using executor 
SequentialExecutor
/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/airflow/bin/cli.py:1596:
 DeprecationWarning: The celeryd_concurrency option in [celery] has been 
renamed to worker_concurrency - the old setting has been used, but please 
update your config.
  default=conf.get('celery', 'worker_concurrency')),
DB: sqlite:///./airflow.db
[2018-08-08 00:51:32,833] {db.py:338} INFO - Creating tables
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> cc1e65623dc7, 
add max tries column to task instance
WARNI [airflow.utils.log.logging_mixin.LoggingMixin] Could not import 
KubernetesPodOperator: No module named kubernetes
WARNI [airflow.utils.log.logging_mixin.LoggingMixin] Install kubernetes 
dependencies with: pip install airflow['kubernetes']
Traceback (most recent call last):
  File "/Users/georgelesliewaksman/.pyenv/versions/temp2/bin/airflow", line 32, 
in 
args.func(args)
  File 
"/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/airflow/utils/cli.py",
 line 74, in wrapper
return f(*args, **kwargs)
  File 
"/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/airflow/bin/cli.py",
 line 1020, in upgradedb
db_utils.upgradedb()
  File 
"/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/airflow/utils/db.py",
 line 346, in upgradedb
command.upgrade(config, 'heads')
  File 
"/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/alembic/command.py",
 line 174, in upgrade
script.run_env()
  File 
"/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/alembic/script/base.py",
 line 416, in run_env
util.load_python_file(self.dir, 'env.py')
  File 
"/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/alembic/util/pyfiles.py",
 line 93, in load_python_file
module = load_module_py(module_id, path)
  File 
"/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/alembic/util/compat.py",
 line 79, in load_module_py
mod = imp.load_source(module_id, path, fp)
  File 
"/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/airflow/migrations/env.py",
 line 91, in 
run_migrations_online()
  File 
"/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/airflow/migrations/env.py",
 line 86, in run_migrations_online
context.run_migrations()
  File "", line 8, in run_migrations
  File 
"/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/alembic/runtime/environment.py",
 line 807, in run_migrations
self.get_context().run_migrations(**kw)
  File 
"/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/alembic/runtime/migration.py",
 line 321, in run_migrations
step.migration_fn(**kw)
  File 
"/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py",
 line 66, in upgrade
).limit(BATCH_SIZE).all()
  File 
"/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
 line 2703, in all
return list(self)
  File 
"/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
 line 2855, in __iter__
return self._execute_and_instances(context)
  File 
"/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
 line 2878, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
  File 
"/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/sqlalchemy/engine/base.py",
 line 945, in execute
return meth(self, multiparams, 

[jira] [Commented] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance

2018-08-08 Thread George Leslie-Waksman (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572816#comment-16572816
 ] 

George Leslie-Waksman commented on AIRFLOW-2870:


The process to reproduce is as follows:
 # Start with an Airflow deployment that predates 
{{cc1e65623dc7_add_max_tries_column_to_task_instance.py}} (e.g. 1.8.1)
 # Run Airflow enough to populate task_instances in the metadata database (run 
one of the sample dags)
 # Install an Airflow version after 
{{27c6a30d7c24_add_executor_config_to_task_instance.py}} (e.g. 1.10rc3)
 # {{airflow upgradedb}}

This will fail with a message about the column "task_instance.executor_config" 
not existing.

My current understanding of what is happening:
 * When constructing a sqlalchemy orm query using a declarative model (i.e. 
{{TaskInstance}}), the database table must be consistent with the structure of 
that model.
 ** SQLAlchemy's mapper will query all columns known to the orm mapper (code 
side) and assume they exist in the database
 * When running a migration, the database table is in a transitionary state
 * The code in {{airflow/models.py}} reflects the state of the database after 
running ALL migrations through the present
* When we are using the 1.10rc3 code to run migrations and we reach 
{{cc1e65623dc7_add_max_tries_column_to_task_instance.py}}, we [import 
TaskInstance|https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py#L36]
 as if it has all future columns and then [query the old 
schema|https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py#L64]

Under typical circumstances, one can avoid this issue by performing migrations 
using alembic + SQLAlchemy core (no orm) and directly manipulating the tables. 
However, in this case, we need to populate information from a {{Task}} object 
that does not have a representation in the database.

We may be able to work around the database issues by manipulating SQLAlchemy's 
[column 
loading|http://docs.sqlalchemy.org/en/latest/orm/loading_columns.html#load-only-cols]
 but that may be tricky given the intertwined nature of Airflow's model code.

> Migrations fail when upgrading from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance
> 
>
> Key: AIRFLOW-2870
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2870
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: George Leslie-Waksman
>Priority: Blocker
>
> Running migrations from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with:
> {noformat}
> INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
> INFO  [alembic.runtime.migration] Will assume transactional DDL.
> INFO  [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> 
> cc1e65623dc7, add max tries column to task instance
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", 
> line 1182, in _execute_context
> context)
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", 
> line 470, in do_execute
> cursor.execute(statement, parameters)
> psycopg2.ProgrammingError: column task_instance.executor_config does not exist
> LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta...
> {noformat}
> The failure is occurring because 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance 
> from the current code version, which has changes to the task_instance table 
> that are not expected by the migration.
> Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an 
> executor_config column that does not exist as of when 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py is run.
> It is worth noting that this will not be observed for new installs because 
> the migration branches on table existence/non-existence at a point that will 
> hide the issue from new installs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)