[jira] [Commented] (AIRFLOW-31) Use standard imports for hooks/operators

2016-06-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15339870#comment-15339870
 ] 

ASF subversion and git services commented on AIRFLOW-31:


Commit 45b735baeac794b54dd89ced2f43eec54adf13f7 in incubator-airflow's branch 
refs/heads/master from jlowin
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=45b735b ]

[AIRFLOW-31] Add zope dependency

Closes #1608 from jlowin/standard-imports-2.
Also closes AIRFLOW-257.


> Use standard imports for hooks/operators
> 
>
> Key: AIRFLOW-31
> URL: https://issues.apache.org/jira/browse/AIRFLOW-31
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>  Labels: enhancement
> Fix For: Airflow 1.8
>
>
> (Migrated from https://github.com/airbnb/airflow/issues/1238)
> Currently, Airflow uses a relatively complex import mechanism to import hooks 
> and operators without polluting the namespace with submodules. I would like 
> to propose that Airflow abandon that system and use standard Python importing.
> Here are a few major reasons why I think the current system has run its 
> course.
> h3. Polluting namespace
> The biggest advantage of the current system, as I understand it, is that only 
> Operators appear in the `airflow.operators` namespace.  The submodules that 
> actually contain the operators do not.
> So for example while `airflow.operators.python_operator.PythonOperator` is a 
> thing, `PythonOperator` is in the `airflow.operators` namespace but 
> `python_operator` is not.
> I think this sort of namespace pollution was helpful when Airflow was a 
> smaller project, but as the number of hooks/operators grows -- and especially 
> as the `contrib` hooks/operators grow -- I'd argue that namespacing is a 
> *good thing*. It provides structure and organization, and opportunities for 
> documentation (through module docstrings).
> In fact, I'd argue that the current namespace is itself getting quite 
> polluted -- the only way to know what's available is to use something like 
> Ipython tab-completion to browse an alphabetical list of Operator names, or 
> to load the source file and grok the import definition (which no one 
> installing from pypi is likely to do).
> h3. Conditional imports
> There's a second advantage to the current system that any module that fails 
> to import is silently ignored. It makes it easy to have optional 
> dependencies. For example, if someone doesn't have `boto` installed, then 
> they don't have an `S3Hook` either. Same for a HiveOperator
> Again, as Airflow grows and matures, I think this is a little too magic. If 
> my environment is missing a dependency, I want to hear about it.
> On the other hand, the `contrib` namespace sort of depends on this -- we 
> don't want users to have to install every single dependency. So I propose 
> that contrib modules all live in their submodules: `from 
> airflow.contrib.operators.my_operator import MyOperator`. As mentioned 
> previously, having structure and namespacing is a good thing as the project 
> gets more complex.
> Other ways to handle this include putting "non-standard" dependencies inside 
> the operator/hook rather than the module (see `HiveOperator`/`HiveHook`), so 
> it can be imported but not used. Another is judicious use of `try`/`except 
> ImportError`. The simplest is to make people import things explicitly from 
> submodules.
> h3. Operator dependencies
> Right now, operators can't depend on each other if they aren't in the same 
> file. This is for the simple reason that there is no guarantee on what order 
> the operators will be loaded. It all comes down to which dictionary key gets 
> loaded first. One day Operator B could be loaded after Operator A; the next 
> day it might be loaded before. Consequently, A and B can't depend on each 
> other. Worse, if a user makes two operators that do depend on each other, 
> they won't get an error message when one fails to import.
> For contrib modules in particular, this is sort of killer.
> h3. Ease of use
> It's *hard* to set up imports for a new operator. The dictionary-based import 
> instructions aren't obvious for new users, and errors are silently dismissed 
> which makes debugging difficult.
> h3. Identity
> Surprisingly, `airflow.operators.SubDagOperator != 
> airflow.operators.subdag_operator.SubDagOperator`. See #1168.
> h2. Proposal
> Use standard python importing for hooks/operators/etc.
> - `__init__.py` files use straightforward, standard Python imports
> - major operators are available at `airflow.operators.OperatorName` or 
> `airflow.operators.operator_module.OperatorName`.
> - contrib operators are only available at 
> 

[jira] [Commented] (AIRFLOW-31) Use standard imports for hooks/operators

2016-06-17 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335919#comment-15335919
 ] 

ASF subversion and git services commented on AIRFLOW-31:


Commit 851adc5547597ec51743be4bc47d634c77d6dc17 in incubator-airflow's branch 
refs/heads/master from jlowin
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=851adc5 ]

[AIRFLOW-31] Use standard imports for hooks/operators


> Use standard imports for hooks/operators
> 
>
> Key: AIRFLOW-31
> URL: https://issues.apache.org/jira/browse/AIRFLOW-31
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>  Labels: enhancement
> Fix For: Airflow 2.0
>
>
> (Migrated from https://github.com/airbnb/airflow/issues/1238)
> Currently, Airflow uses a relatively complex import mechanism to import hooks 
> and operators without polluting the namespace with submodules. I would like 
> to propose that Airflow abandon that system and use standard Python importing.
> Here are a few major reasons why I think the current system has run its 
> course.
> h3. Polluting namespace
> The biggest advantage of the current system, as I understand it, is that only 
> Operators appear in the `airflow.operators` namespace.  The submodules that 
> actually contain the operators do not.
> So for example while `airflow.operators.python_operator.PythonOperator` is a 
> thing, `PythonOperator` is in the `airflow.operators` namespace but 
> `python_operator` is not.
> I think this sort of namespace pollution was helpful when Airflow was a 
> smaller project, but as the number of hooks/operators grows -- and especially 
> as the `contrib` hooks/operators grow -- I'd argue that namespacing is a 
> *good thing*. It provides structure and organization, and opportunities for 
> documentation (through module docstrings).
> In fact, I'd argue that the current namespace is itself getting quite 
> polluted -- the only way to know what's available is to use something like 
> Ipython tab-completion to browse an alphabetical list of Operator names, or 
> to load the source file and grok the import definition (which no one 
> installing from pypi is likely to do).
> h3. Conditional imports
> There's a second advantage to the current system that any module that fails 
> to import is silently ignored. It makes it easy to have optional 
> dependencies. For example, if someone doesn't have `boto` installed, then 
> they don't have an `S3Hook` either. Same for a HiveOperator
> Again, as Airflow grows and matures, I think this is a little too magic. If 
> my environment is missing a dependency, I want to hear about it.
> On the other hand, the `contrib` namespace sort of depends on this -- we 
> don't want users to have to install every single dependency. So I propose 
> that contrib modules all live in their submodules: `from 
> airflow.contrib.operators.my_operator import MyOperator`. As mentioned 
> previously, having structure and namespacing is a good thing as the project 
> gets more complex.
> Other ways to handle this include putting "non-standard" dependencies inside 
> the operator/hook rather than the module (see `HiveOperator`/`HiveHook`), so 
> it can be imported but not used. Another is judicious use of `try`/`except 
> ImportError`. The simplest is to make people import things explicitly from 
> submodules.
> h3. Operator dependencies
> Right now, operators can't depend on each other if they aren't in the same 
> file. This is for the simple reason that there is no guarantee on what order 
> the operators will be loaded. It all comes down to which dictionary key gets 
> loaded first. One day Operator B could be loaded after Operator A; the next 
> day it might be loaded before. Consequently, A and B can't depend on each 
> other. Worse, if a user makes two operators that do depend on each other, 
> they won't get an error message when one fails to import.
> For contrib modules in particular, this is sort of killer.
> h3. Ease of use
> It's *hard* to set up imports for a new operator. The dictionary-based import 
> instructions aren't obvious for new users, and errors are silently dismissed 
> which makes debugging difficult.
> h3. Identity
> Surprisingly, `airflow.operators.SubDagOperator != 
> airflow.operators.subdag_operator.SubDagOperator`. See #1168.
> h2. Proposal
> Use standard python importing for hooks/operators/etc.
> - `__init__.py` files use straightforward, standard Python imports
> - major operators are available at `airflow.operators.OperatorName` or 
> `airflow.operators.operator_module.OperatorName`.
> - contrib operators are only available at 
> `airflow.contrib.operators.operator_module.OperatorName` in order to manage 
> 

[jira] [Commented] (AIRFLOW-31) Use standard imports for hooks/operators

2016-06-11 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325763#comment-15325763
 ] 

Chris Riccomini commented on AIRFLOW-31:


Please have a look at AIRFLOW-200, and 
[this|https://github.com/apache/incubator-airflow/pull/1586] PR. While it does 
not  use standard imports, it does improve the experience for the current 
import style, so that people can see why hooks/operators can't import, and 
makes loads lazy.

> Use standard imports for hooks/operators
> 
>
> Key: AIRFLOW-31
> URL: https://issues.apache.org/jira/browse/AIRFLOW-31
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>  Labels: enhancement
> Fix For: Airflow 2.0
>
>
> (Migrated from https://github.com/airbnb/airflow/issues/1238)
> Currently, Airflow uses a relatively complex import mechanism to import hooks 
> and operators without polluting the namespace with submodules. I would like 
> to propose that Airflow abandon that system and use standard Python importing.
> Here are a few major reasons why I think the current system has run its 
> course.
> h3. Polluting namespace
> The biggest advantage of the current system, as I understand it, is that only 
> Operators appear in the `airflow.operators` namespace.  The submodules that 
> actually contain the operators do not.
> So for example while `airflow.operators.python_operator.PythonOperator` is a 
> thing, `PythonOperator` is in the `airflow.operators` namespace but 
> `python_operator` is not.
> I think this sort of namespace pollution was helpful when Airflow was a 
> smaller project, but as the number of hooks/operators grows -- and especially 
> as the `contrib` hooks/operators grow -- I'd argue that namespacing is a 
> *good thing*. It provides structure and organization, and opportunities for 
> documentation (through module docstrings).
> In fact, I'd argue that the current namespace is itself getting quite 
> polluted -- the only way to know what's available is to use something like 
> Ipython tab-completion to browse an alphabetical list of Operator names, or 
> to load the source file and grok the import definition (which no one 
> installing from pypi is likely to do).
> h3. Conditional imports
> There's a second advantage to the current system that any module that fails 
> to import is silently ignored. It makes it easy to have optional 
> dependencies. For example, if someone doesn't have `boto` installed, then 
> they don't have an `S3Hook` either. Same for a HiveOperator
> Again, as Airflow grows and matures, I think this is a little too magic. If 
> my environment is missing a dependency, I want to hear about it.
> On the other hand, the `contrib` namespace sort of depends on this -- we 
> don't want users to have to install every single dependency. So I propose 
> that contrib modules all live in their submodules: `from 
> airflow.contrib.operators.my_operator import MyOperator`. As mentioned 
> previously, having structure and namespacing is a good thing as the project 
> gets more complex.
> Other ways to handle this include putting "non-standard" dependencies inside 
> the operator/hook rather than the module (see `HiveOperator`/`HiveHook`), so 
> it can be imported but not used. Another is judicious use of `try`/`except 
> ImportError`. The simplest is to make people import things explicitly from 
> submodules.
> h3. Operator dependencies
> Right now, operators can't depend on each other if they aren't in the same 
> file. This is for the simple reason that there is no guarantee on what order 
> the operators will be loaded. It all comes down to which dictionary key gets 
> loaded first. One day Operator B could be loaded after Operator A; the next 
> day it might be loaded before. Consequently, A and B can't depend on each 
> other. Worse, if a user makes two operators that do depend on each other, 
> they won't get an error message when one fails to import.
> For contrib modules in particular, this is sort of killer.
> h3. Ease of use
> It's *hard* to set up imports for a new operator. The dictionary-based import 
> instructions aren't obvious for new users, and errors are silently dismissed 
> which makes debugging difficult.
> h3. Identity
> Surprisingly, `airflow.operators.SubDagOperator != 
> airflow.operators.subdag_operator.SubDagOperator`. See #1168.
> h2. Proposal
> Use standard python importing for hooks/operators/etc.
> - `__init__.py` files use straightforward, standard Python imports
> - major operators are available at `airflow.operators.OperatorName` or 
> `airflow.operators.operator_module.OperatorName`.
> - contrib operators are only available at 
> `airflow.contrib.operators.operator_module.OperatorName` in order 

[jira] [Commented] (AIRFLOW-31) Use standard imports for hooks/operators

2016-05-02 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267741#comment-15267741
 ] 

Chris Riccomini commented on AIRFLOW-31:


_NOTE: I found this odd import behavior totally counterintuitive, and it took 
me a while to grok what was going on._

> Use standard imports for hooks/operators
> 
>
> Key: AIRFLOW-31
> URL: https://issues.apache.org/jira/browse/AIRFLOW-31
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: Airflow 2.0
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>  Labels: enhancement
>
> (Migrated from https://github.com/airbnb/airflow/issues/1238)
> Currently, Airflow uses a relatively complex import mechanism to import hooks 
> and operators without polluting the namespace with submodules. I would like 
> to propose that Airflow abandon that system and use standard Python importing.
> Here are a few major reasons why I think the current system has run its 
> course.
> h3. Polluting namespace
> The biggest advantage of the current system, as I understand it, is that only 
> Operators appear in the `airflow.operators` namespace.  The submodules that 
> actually contain the operators do not.
> So for example while `airflow.operators.python_operator.PythonOperator` is a 
> thing, `PythonOperator` is in the `airflow.operators` namespace but 
> `python_operator` is not.
> I think this sort of namespace pollution was helpful when Airflow was a 
> smaller project, but as the number of hooks/operators grows -- and especially 
> as the `contrib` hooks/operators grow -- I'd argue that namespacing is a 
> *good thing*. It provides structure and organization, and opportunities for 
> documentation (through module docstrings).
> In fact, I'd argue that the current namespace is itself getting quite 
> polluted -- the only way to know what's available is to use something like 
> Ipython tab-completion to browse an alphabetical list of Operator names, or 
> to load the source file and grok the import definition (which no one 
> installing from pypi is likely to do).
> h3. Conditional imports
> There's a second advantage to the current system that any module that fails 
> to import is silently ignored. It makes it easy to have optional 
> dependencies. For example, if someone doesn't have `boto` installed, then 
> they don't have an `S3Hook` either. Same for a HiveOperator
> Again, as Airflow grows and matures, I think this is a little too magic. If 
> my environment is missing a dependency, I want to hear about it.
> On the other hand, the `contrib` namespace sort of depends on this -- we 
> don't want users to have to install every single dependency. So I propose 
> that contrib modules all live in their submodules: `from 
> airflow.contrib.operators.my_operator import MyOperator`. As mentioned 
> previously, having structure and namespacing is a good thing as the project 
> gets more complex.
> Other ways to handle this include putting "non-standard" dependencies inside 
> the operator/hook rather than the module (see `HiveOperator`/`HiveHook`), so 
> it can be imported but not used. Another is judicious use of `try`/`except 
> ImportError`. The simplest is to make people import things explicitly from 
> submodules.
> h3. Operator dependencies
> Right now, operators can't depend on each other if they aren't in the same 
> file. This is for the simple reason that there is no guarantee on what order 
> the operators will be loaded. It all comes down to which dictionary key gets 
> loaded first. One day Operator B could be loaded after Operator A; the next 
> day it might be loaded before. Consequently, A and B can't depend on each 
> other. Worse, if a user makes two operators that do depend on each other, 
> they won't get an error message when one fails to import.
> For contrib modules in particular, this is sort of killer.
> h3. Ease of use
> It's *hard* to set up imports for a new operator. The dictionary-based import 
> instructions aren't obvious for new users, and errors are silently dismissed 
> which makes debugging difficult.
> h3. Identity
> Surprisingly, `airflow.operators.SubDagOperator != 
> airflow.operators.subdag_operator.SubDagOperator`. See #1168.
> h2. Proposal
> Use standard python importing for hooks/operators/etc.
> - `__init__.py` files use straightforward, standard Python imports
> - major operators are available at `airflow.operators.OperatorName` or 
> `airflow.operators.operator_module.OperatorName`.
> - contrib operators are only available at 
> `airflow.contrib.operators.operator_module.OperatorName` in order to manage 
> dependencies
> - operator authors are encouraged to use `__all__` to define their module's 
> exports
> Possibly delete namespace afterward
> - in `operators/__init__.py`, run a function at the