[jira] [Updated] (AIRFLOW-116) Surface Airflow Version On Webservers

2016-05-16 Thread Siddharth Anand (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Anand updated AIRFLOW-116:

Attachment: git_hash_points_to_commit_tree.png

> Surface Airflow Version On Webservers
> -
>
> Key: AIRFLOW-116
> URL: https://issues.apache.org/jira/browse/AIRFLOW-116
> Project: Apache Airflow
>  Issue Type: Task
>  Components: webserver
>Reporter: Dan Davydov
>Assignee: Siddharth Anand
>Priority: Minor
> Attachments: Airflow_version_points_to_PyPi.png, 
> New_version_view.png, git_hash_points_to_commit_tree.png
>
>
> Surface Airflow Version On Webservers
> Why?
> Figuring out what version webservers are running requires sshing to the 
> webservers which isn't very sane (and not everyone has permissions to do 
> this).
> Success:
> Surface the current airflow version in the webserver (bonus points if the git 
> sha of the airflow code is shown too although this could be hacky), either on 
> every page as a static element or on a dedicated settings page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-116) Surface Airflow Version On Webservers

2016-05-16 Thread Siddharth Anand (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Anand updated AIRFLOW-116:

Attachment: New_version_view.png

> Surface Airflow Version On Webservers
> -
>
> Key: AIRFLOW-116
> URL: https://issues.apache.org/jira/browse/AIRFLOW-116
> Project: Apache Airflow
>  Issue Type: Task
>  Components: webserver
>Reporter: Dan Davydov
>Assignee: Siddharth Anand
>Priority: Minor
> Attachments: New_version_view.png
>
>
> Surface Airflow Version On Webservers
> Why?
> Figuring out what version webservers are running requires sshing to the 
> webservers which isn't very sane (and not everyone has permissions to do 
> this).
> Success:
> Surface the current airflow version in the webserver (bonus points if the git 
> sha of the airflow code is shown too although this could be hacky), either on 
> every page as a static element or on a dedicated settings page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[2/2] incubator-airflow git commit: [AIRFLOW-109] Fix try catch handling in PrestoHook

2016-05-16 Thread arthur
[AIRFLOW-109] Fix try catch handling in PrestoHook

This addresses the issue with executing the SQL statement outside of
the try block. In the case of a syntax error in the statement, the
underlying library raises a Databases error which was meant to be
handled (i.e., json parsed) by the catch.


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/6f4696ba
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/6f4696ba
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/6f4696ba

Branch: refs/heads/master
Commit: 6f4696ba2ef18d74be8c18080b8ea7b9419608fb
Parents: db07e04 d18a782
Author: Arthur Wiedmer 
Authored: Mon May 16 14:12:12 2016 -0700
Committer: Arthur Wiedmer 
Committed: Mon May 16 14:12:12 2016 -0700

--
 airflow/hooks/presto_hook.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--




[jira] [Commented] (AIRFLOW-109) PrestoHook get_pandas_df executes a method that can raise outside of the try catch statement.

2016-05-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285327#comment-15285327
 ] 

ASF subversion and git services commented on AIRFLOW-109:
-

Commit 6f4696ba2ef18d74be8c18080b8ea7b9419608fb in incubator-airflow's branch 
refs/heads/master from [~artwr]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=6f4696b ]

[AIRFLOW-109] Fix try catch handling in PrestoHook

This addresses the issue with executing the SQL statement outside of
the try block. In the case of a syntax error in the statement, the
underlying library raises a Databases error which was meant to be
handled (i.e., json parsed) by the catch.


> PrestoHook get_pandas_df executes a method that can raise outside of the try 
> catch statement.
> -
>
> Key: AIRFLOW-109
> URL: https://issues.apache.org/jira/browse/AIRFLOW-109
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks
>Affects Versions: Airflow 1.8, Airflow 1.7.1, Airflow 1.6.2
>Reporter: Arthur Wiedmer
>Assignee: Arthur Wiedmer
>Priority: Minor
>  Labels: Presto
>
> This issue occurs when a malformed SQL statement is passed to the 
> get_pandas_df method of the presto hook. Pyhive raises a DatabaseError 
> outside of the try catch, leading in the wrong kind of error being raised.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-85) Create DAGs UI

2016-05-16 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285295#comment-15285295
 ] 

Chris Riccomini commented on AIRFLOW-85:


It looks like Airflow is already using Flask-Login, which means we can use the 
same Flask-Login stuff on the new page. 

> Create DAGs UI
> --
>
> Key: AIRFLOW-85
> URL: https://issues.apache.org/jira/browse/AIRFLOW-85
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security, ui
>Reporter: Chris Riccomini
>
> Airflow currently provides only an {{/admin}} UI interface for the webapp. 
> This UI provides three distinct roles:
> * Admin
> * Data profiler
> * None
> In addition, Airflow currently provides the ability to log in, either via a 
> secure proxy front-end, or via LDAP/Kerberos, within the webapp.
> We run Airflow with LDAP authentication enabled. This helps us control access 
> to the UI. However, there is insufficient granularity within the UI. We would 
> like to be able to grant users the ability to:
> # View their DAGs, but no one else's.
> # Control their DAGs, but no one else's.
> This is not possible right now. You can take away the ability to access the 
> connections and data profiling tabs, but users can still see all DAGs, as 
> well as control the state of the DB by clearing any DAG status, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-118) use targetPartitionSize as the default partition spec for HiveToDruidTransfer operator

2016-05-16 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285213#comment-15285213
 ] 

Chris Riccomini commented on AIRFLOW-118:
-

Was there a PR for this?

> use targetPartitionSize as the default partition spec for HiveToDruidTransfer 
> operator 
> ---
>
> Key: AIRFLOW-118
> URL: https://issues.apache.org/jira/browse/AIRFLOW-118
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Reporter: Hongbo Zeng
>
> The definition of the two partition spec can be found 
> http://druid.io/docs/latest/ingestion/batch-ingestion.html.
> Originally, the HiveToDruidTransfer uses numShards. The disadvantage of that 
> is users need to tune the numbers repeatedly, and do that again when the data 
> size changes. This is not scalable as the number of data sources grows. 
> targetPartitionSize approach calculates the number of segments automatically 
> and is hassle free.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (AIRFLOW-118) use targetPartitionSize as the default partition spec for HiveToDruidTransfer operator

2016-05-16 Thread Dan Davydov (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Davydov resolved AIRFLOW-118.
-
Resolution: Fixed

> use targetPartitionSize as the default partition spec for HiveToDruidTransfer 
> operator 
> ---
>
> Key: AIRFLOW-118
> URL: https://issues.apache.org/jira/browse/AIRFLOW-118
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Reporter: Hongbo Zeng
>
> The definition of the two partition spec can be found 
> http://druid.io/docs/latest/ingestion/batch-ingestion.html.
> Originally, the HiveToDruidTransfer uses numShards. The disadvantage of that 
> is users need to tune the numbers repeatedly, and do that again when the data 
> size changes. This is not scalable as the number of data sources grows. 
> targetPartitionSize approach calculates the number of segments automatically 
> and is hassle free.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[1/3] incubator-airflow git commit: use targetPartitionSize as the default partition spec

2016-05-16 Thread davydov
Repository: incubator-airflow
Updated Branches:
  refs/heads/master 72ab63e83 -> db07e04f9


use targetPartitionSize as the default partition spec


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/b565ef99
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/b565ef99
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/b565ef99

Branch: refs/heads/master
Commit: b565ef9952488b6a5f77becab3c430816af33e90
Parents: 07fe7d7
Author: Hongbo Zeng 
Authored: Sat May 14 17:00:42 2016 -0700
Committer: Hongbo Zeng 
Committed: Sat May 14 17:00:42 2016 -0700

--
 airflow/hooks/druid_hook.py| 23 ---
 airflow/operators/hive_to_druid.py |  8 +---
 2 files changed, 21 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/b565ef99/airflow/hooks/druid_hook.py
--
diff --git a/airflow/hooks/druid_hook.py b/airflow/hooks/druid_hook.py
index b6cb231..7c80c7c 100644
--- a/airflow/hooks/druid_hook.py
+++ b/airflow/hooks/druid_hook.py
@@ -10,7 +10,7 @@ from airflow.hooks.base_hook import BaseHook
 from airflow.exceptions import AirflowException
 
 LOAD_CHECK_INTERVAL = 5
-
+TARGET_PARTITION_SIZE = 500
 
 class AirflowDruidLoadException(AirflowException):
 pass
@@ -52,13 +52,22 @@ class DruidHook(BaseHook):
 
 def construct_ingest_query(
 self, datasource, static_path, ts_dim, columns, metric_spec,
-intervals, num_shards, hadoop_dependency_coordinates=None):
+intervals, num_shards, target_partition_size, 
hadoop_dependency_coordinates=None):
 """
 Builds an ingest query for an HDFS TSV load.
 
 :param datasource: target datasource in druid
 :param columns: list of all columns in the TSV, in the right order
 """
+
+# backward compatibilty for num_shards, but target_partition_size is 
the default setting
+# and overwrites the num_shards
+if target_partition_size == -1:
+if num_shards == -1:
+target_partition_size = TARGET_PARTITION_SIZE
+else:
+num_shards = -1
+
 metric_names = [
 m['fieldName'] for m in metric_spec if m['type'] != 'count']
 dimensions = [c for c in columns if c not in metric_names and c != 
ts_dim]
@@ -100,7 +109,7 @@ class DruidHook(BaseHook):
 },
 "partitionsSpec" : {
 "type" : "hashed",
-"targetPartitionSize" : -1,
+"targetPartitionSize" : target_partition_size,
 "numShards" : num_shards,
 },
 },
@@ -121,10 +130,10 @@ class DruidHook(BaseHook):
 
 def send_ingest_query(
 self, datasource, static_path, ts_dim, columns, metric_spec,
-intervals, num_shards, hadoop_dependency_coordinates=None):
+intervals, num_shards, target_partition_size, 
hadoop_dependency_coordinates=None):
 query = self.construct_ingest_query(
 datasource, static_path, ts_dim, columns,
-metric_spec, intervals, num_shards, hadoop_dependency_coordinates)
+metric_spec, intervals, num_shards, target_partition_size, 
hadoop_dependency_coordinates)
 r = requests.post(
 self.ingest_post_url, headers=self.header, data=query)
 logging.info(self.ingest_post_url)
@@ -138,7 +147,7 @@ class DruidHook(BaseHook):
 
 def load_from_hdfs(
 self, datasource, static_path,  ts_dim, columns,
-intervals, num_shards, metric_spec=None, 
hadoop_dependency_coordinates=None):
+intervals, num_shards, target_partition_size, metric_spec=None, 
hadoop_dependency_coordinates=None):
 """
 load data to druid from hdfs
 :params ts_dim: The column name to use as a timestamp
@@ -146,7 +155,7 @@ class DruidHook(BaseHook):
 """
 task_id = self.send_ingest_query(
 datasource, static_path, ts_dim, columns, metric_spec,
-intervals, num_shards, hadoop_dependency_coordinates)
+intervals, num_shards, target_partition_size, 
hadoop_dependency_coordinates)
 status_url = self.get_ingest_status_url(task_id)
 while True:
 r = requests.get(status_url)

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/b565ef99/airflow/operators/hive_to_druid.py
--
diff --git a/airflow/operators/hive_to_druid.py 
b/airflow/operators/hive_to_druid.py
index 1346841..420aeed 100644
--- 

[2/3] incubator-airflow git commit: change TARGET_PARTITION_SIZE to DEFAULT_TARGET_PARTITION_SIZE

2016-05-16 Thread davydov
change TARGET_PARTITION_SIZE to DEFAULT_TARGET_PARTITION_SIZE


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/199e07a4
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/199e07a4
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/199e07a4

Branch: refs/heads/master
Commit: 199e07a455e77ceb33e06ab7646fa957a5fbd232
Parents: b565ef9
Author: Hongbo Zeng 
Authored: Mon May 16 11:20:49 2016 -0700
Committer: Hongbo Zeng 
Committed: Mon May 16 11:20:49 2016 -0700

--
 airflow/hooks/druid_hook.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/199e07a4/airflow/hooks/druid_hook.py
--
diff --git a/airflow/hooks/druid_hook.py b/airflow/hooks/druid_hook.py
index 7c80c7c..bb6d9fa 100644
--- a/airflow/hooks/druid_hook.py
+++ b/airflow/hooks/druid_hook.py
@@ -10,7 +10,7 @@ from airflow.hooks.base_hook import BaseHook
 from airflow.exceptions import AirflowException
 
 LOAD_CHECK_INTERVAL = 5
-TARGET_PARTITION_SIZE = 500
+DEFAULT_TARGET_PARTITION_SIZE = 500
 
 class AirflowDruidLoadException(AirflowException):
 pass
@@ -64,7 +64,7 @@ class DruidHook(BaseHook):
 # and overwrites the num_shards
 if target_partition_size == -1:
 if num_shards == -1:
-target_partition_size = TARGET_PARTITION_SIZE
+target_partition_size = DEFAULT_TARGET_PARTITION_SIZE
 else:
 num_shards = -1
 



[3/3] incubator-airflow git commit: Merge branch '1503'

2016-05-16 Thread davydov
Merge branch '1503'


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/db07e04f
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/db07e04f
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/db07e04f

Branch: refs/heads/master
Commit: db07e04f97f34bc43fccc238291d22253b85316d
Parents: 72ab63e 199e07a
Author: Dan Davydov 
Authored: Mon May 16 13:15:49 2016 -0700
Committer: Dan Davydov 
Committed: Mon May 16 13:15:49 2016 -0700

--
 airflow/hooks/druid_hook.py| 23 ---
 airflow/operators/hive_to_druid.py |  8 +---
 2 files changed, 21 insertions(+), 10 deletions(-)
--




[jira] [Commented] (AIRFLOW-85) Create DAGs UI

2016-05-16 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285143#comment-15285143
 ] 

Bolke de Bruin commented on AIRFLOW-85:
---

Good start I think. You night also want to include an "ops" view, ie someone 
that can adjust pools, connections etc. Over the coming days I will give it a 
bit more thought. But count me in :)

> Create DAGs UI
> --
>
> Key: AIRFLOW-85
> URL: https://issues.apache.org/jira/browse/AIRFLOW-85
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security, ui
>Reporter: Chris Riccomini
>
> Airflow currently provides only an {{/admin}} UI interface for the webapp. 
> This UI provides three distinct roles:
> * Admin
> * Data profiler
> * None
> In addition, Airflow currently provides the ability to log in, either via a 
> secure proxy front-end, or via LDAP/Kerberos, within the webapp.
> We run Airflow with LDAP authentication enabled. This helps us control access 
> to the UI. However, there is insufficient granularity within the UI. We would 
> like to be able to grant users the ability to:
> # View their DAGs, but no one else's.
> # Control their DAGs, but no one else's.
> This is not possible right now. You can take away the ability to access the 
> connections and data profiling tabs, but users can still see all DAGs, as 
> well as control the state of the DB by clearing any DAG status, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-116) Surface Airflow Version On Webservers

2016-05-16 Thread Siddharth Anand (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285002#comment-15285002
 ] 

Siddharth Anand commented on AIRFLOW-116:
-

[this|http://stackoverflow.com/a/7071358/1110993] seems like a good way to 
specify versions. We currently have it specified (and duplicated in 2 places : 
both setup.py and init.py

> Surface Airflow Version On Webservers
> -
>
> Key: AIRFLOW-116
> URL: https://issues.apache.org/jira/browse/AIRFLOW-116
> Project: Apache Airflow
>  Issue Type: Task
>  Components: webserver
>Reporter: Dan Davydov
>Assignee: Siddharth Anand
>Priority: Minor
>
> Surface Airflow Version On Webservers
> Why?
> Figuring out what version webservers are running requires sshing to the 
> webservers which isn't very sane (and not everyone has permissions to do 
> this).
> Success:
> Surface the current airflow version in the webserver (bonus points if the git 
> sha of the airflow code is shown too although this could be hacky), either on 
> every page as a static element or on a dedicated settings page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-109) PrestoHook get_pandas_df executes a method that can raise outside of the try catch statement.

2016-05-16 Thread Siddharth Anand (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284984#comment-15284984
 ] 

Siddharth Anand commented on AIRFLOW-109:
-

+1

> PrestoHook get_pandas_df executes a method that can raise outside of the try 
> catch statement.
> -
>
> Key: AIRFLOW-109
> URL: https://issues.apache.org/jira/browse/AIRFLOW-109
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks
>Affects Versions: Airflow 1.8, Airflow 1.7.1, Airflow 1.6.2
>Reporter: Arthur Wiedmer
>Assignee: Arthur Wiedmer
>Priority: Minor
>  Labels: Presto
>
> This issue occurs when a malformed SQL statement is passed to the 
> get_pandas_df method of the presto hook. Pyhive raises a DatabaseError 
> outside of the try catch, leading in the wrong kind of error being raised.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-116) Surface Airflow Version On Webservers

2016-05-16 Thread Siddharth Anand (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284978#comment-15284978
 ] 

Siddharth Anand commented on AIRFLOW-116:
-

I looked for places where __version__ is referenced. I saw the not in setup.py 
that we manually keep the airflow.__version__ in sync, which I'm not sure how 
to interpret. 

{quote}
(venv) sid-as-mbp:airflow siddharth$ grep -ri __version__ . | grep -v Binary | 
grep airflow
grep: ./airflow/www/static/docs: No such file or directory
./airflow/__init__.py:__version__ = "1.7.0"
./airflow/bin/cli.py:print(settings.HEADER + "  v" + airflow.__version__)
./airflow/www/views.py:pre_subtitle=settings.HEADER + "  v" + 
airflow.__version__,
./setup.py:# Kept manually in sync with airflow.__version__
{quote}

> Surface Airflow Version On Webservers
> -
>
> Key: AIRFLOW-116
> URL: https://issues.apache.org/jira/browse/AIRFLOW-116
> Project: Apache Airflow
>  Issue Type: Task
>  Components: webserver
>Reporter: Dan Davydov
>Assignee: Siddharth Anand
>Priority: Minor
>
> Surface Airflow Version On Webservers
> Why?
> Figuring out what version webservers are running requires sshing to the 
> webservers which isn't very sane (and not everyone has permissions to do 
> this).
> Success:
> Surface the current airflow version in the webserver (bonus points if the git 
> sha of the airflow code is shown too although this could be hacky), either on 
> every page as a static element or on a dedicated settings page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-85) Create DAGs UI

2016-05-16 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284961#comment-15284961
 ] 

Chris Riccomini commented on AIRFLOW-85:


Going to write a wiki design proposal.

> Create DAGs UI
> --
>
> Key: AIRFLOW-85
> URL: https://issues.apache.org/jira/browse/AIRFLOW-85
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security, ui
>Reporter: Chris Riccomini
>
> Airflow currently provides only an {{/admin}} UI interface for the webapp. 
> This UI provides three distinct roles:
> * Admin
> * Data profiler
> * None
> In addition, Airflow currently provides the ability to log in, either via a 
> secure proxy front-end, or via LDAP/Kerberos, within the webapp.
> We run Airflow with LDAP authentication enabled. This helps us control access 
> to the UI. However, there is insufficient granularity within the UI. We would 
> like to be able to grant users the ability to:
> # View their DAGs, but no one else's.
> # Control their DAGs, but no one else's.
> This is not possible right now. You can take away the ability to access the 
> connections and data profiling tabs, but users can still see all DAGs, as 
> well as control the state of the DB by clearing any DAG status, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-85) Create DAGs UI

2016-05-16 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284956#comment-15284956
 ] 

Chris Riccomini commented on AIRFLOW-85:


A basic Flask-Principal Need would be:

{noformat}
('dag', 'view', 1)
('dag', 'edit', 1)
('dag', 'view', 2)
...
{noformat}

This allows viewer/editor needs on a per-DAG basis.

> Create DAGs UI
> --
>
> Key: AIRFLOW-85
> URL: https://issues.apache.org/jira/browse/AIRFLOW-85
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security, ui
>Reporter: Chris Riccomini
>
> Airflow currently provides only an {{/admin}} UI interface for the webapp. 
> This UI provides three distinct roles:
> * Admin
> * Data profiler
> * None
> In addition, Airflow currently provides the ability to log in, either via a 
> secure proxy front-end, or via LDAP/Kerberos, within the webapp.
> We run Airflow with LDAP authentication enabled. This helps us control access 
> to the UI. However, there is insufficient granularity within the UI. We would 
> like to be able to grant users the ability to:
> # View their DAGs, but no one else's.
> # Control their DAGs, but no one else's.
> This is not possible right now. You can take away the ability to access the 
> connections and data profiling tabs, but users can still see all DAGs, as 
> well as control the state of the DB by clearing any DAG status, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (AIRFLOW-85) Create DAGs UI

2016-05-16 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284880#comment-15284880
 ] 

Chris Riccomini edited comment on AIRFLOW-85 at 5/16/16 5:29 PM:
-

Based on a cursory investigation of Flask-Login, Flask-Principal, and 
[flask-ldap3-login|https://pypi.python.org/pypi/flask-ldap3-login/], it seems 
like we should use Flask-Login to handle login, Flask-principal to manage 
user/group/roles, and flask-ldap3-login as the LDAP auth for login.

There also appears to be at least one Flask-Login Kerberos plugin, which would 
give us parity with the existing auth mechanism.

The question of how to manage viewer/edit access to specific DAGs remains. A 
simple approach would be to define permissions inside the DAG constructor in 
Python:

{noformat}
{
  'criccomini': 'editor',
  'fbar': viewer,
}
{noformat}

We could then use Flask-Principal, to load the appropriate {{Need}}s when a 
user authenticates.


was (Author: criccomini):
Based on a cursory investigation of Flask-Login, Flask-Principal, and 
[flask-ldap3-login|https://pypi.python.org/pypi/flask-ldap3-login/], it seems 
like we should use Flask-Login to handle login, Flask-principal to manage 
user/group/roles, and flask-ldap3-login as the LDAP auth for login.

There also appears to be at least one Flask-Login Kerberos plugin, which would 
give us parity with the existing auth mechanism.

The question remains over how to manage viewer/edit access to specific DAGs. A 
simple approach would be to define permissions inside the DAG constructor in 
Python:

{noformat}
{
  'criccomini': 'editor',
  'fbar': viewer,
}
{noformat}

We could then use Flask-Principal, to load the appropriate {{Need}}s when a 
user authenticates.

> Create DAGs UI
> --
>
> Key: AIRFLOW-85
> URL: https://issues.apache.org/jira/browse/AIRFLOW-85
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security, ui
>Reporter: Chris Riccomini
>
> Airflow currently provides only an {{/admin}} UI interface for the webapp. 
> This UI provides three distinct roles:
> * Admin
> * Data profiler
> * None
> In addition, Airflow currently provides the ability to log in, either via a 
> secure proxy front-end, or via LDAP/Kerberos, within the webapp.
> We run Airflow with LDAP authentication enabled. This helps us control access 
> to the UI. However, there is insufficient granularity within the UI. We would 
> like to be able to grant users the ability to:
> # View their DAGs, but no one else's.
> # Control their DAGs, but no one else's.
> This is not possible right now. You can take away the ability to access the 
> connections and data profiling tabs, but users can still see all DAGs, as 
> well as control the state of the DB by clearing any DAG status, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (AIRFLOW-85) Create DAGs UI

2016-05-16 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284880#comment-15284880
 ] 

Chris Riccomini edited comment on AIRFLOW-85 at 5/16/16 5:29 PM:
-

Based on a cursory investigation of Flask-Login, Flask-Principal, and 
[flask-ldap3-login|https://pypi.python.org/pypi/flask-ldap3-login/], it seems 
like we should use Flask-Login to handle login, Flask-principal to manage 
user/group/roles, and flask-ldap3-login as the LDAP auth for login.

There also appears to be at least one Flask-Login Kerberos plugin, which would 
give us parity with the existing auth mechanism.

The question of how to manage viewer/edit access to specific DAGs remains. A 
simple approach would be to define permissions inside the DAG constructor in 
Python:

{noformat}
{
  'criccomini': 'editor',
  'fbar': viewer,
}
{noformat}

We could then use Flask-Principal, to load the appropriate {{Needs}} when a 
user authenticates.


was (Author: criccomini):
Based on a cursory investigation of Flask-Login, Flask-Principal, and 
[flask-ldap3-login|https://pypi.python.org/pypi/flask-ldap3-login/], it seems 
like we should use Flask-Login to handle login, Flask-principal to manage 
user/group/roles, and flask-ldap3-login as the LDAP auth for login.

There also appears to be at least one Flask-Login Kerberos plugin, which would 
give us parity with the existing auth mechanism.

The question of how to manage viewer/edit access to specific DAGs remains. A 
simple approach would be to define permissions inside the DAG constructor in 
Python:

{noformat}
{
  'criccomini': 'editor',
  'fbar': viewer,
}
{noformat}

We could then use Flask-Principal, to load the appropriate {{Need}}s when a 
user authenticates.

> Create DAGs UI
> --
>
> Key: AIRFLOW-85
> URL: https://issues.apache.org/jira/browse/AIRFLOW-85
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security, ui
>Reporter: Chris Riccomini
>
> Airflow currently provides only an {{/admin}} UI interface for the webapp. 
> This UI provides three distinct roles:
> * Admin
> * Data profiler
> * None
> In addition, Airflow currently provides the ability to log in, either via a 
> secure proxy front-end, or via LDAP/Kerberos, within the webapp.
> We run Airflow with LDAP authentication enabled. This helps us control access 
> to the UI. However, there is insufficient granularity within the UI. We would 
> like to be able to grant users the ability to:
> # View their DAGs, but no one else's.
> # Control their DAGs, but no one else's.
> This is not possible right now. You can take away the ability to access the 
> connections and data profiling tabs, but users can still see all DAGs, as 
> well as control the state of the DB by clearing any DAG status, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-85) Create DAGs UI

2016-05-16 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284880#comment-15284880
 ] 

Chris Riccomini commented on AIRFLOW-85:


Based on a cursory investigation of Flask-Login, Flask-Principal, and 
[flask-ldap3-login|https://pypi.python.org/pypi/flask-ldap3-login/], it seems 
like we should use Flask-Login to handle login, Flask-principal to manage 
user/group/roles, and flask-ldap3-login as the LDAP auth for login.

There also appears to be at least one Flask-Login Kerberos plugin, which would 
give us parity with the existing auth mechanism.

The question remains over how to manage viewer/edit access to specific DAGs. A 
simple approach would be to define permissions inside the DAG constructor in 
Python:

{noformat}
{
  'criccomini': 'editor',
  'fbar': viewer,
}
{noformat}

We could then use Flask-Principal, to load the appropriate {{Need}}s when a 
user authenticates.

> Create DAGs UI
> --
>
> Key: AIRFLOW-85
> URL: https://issues.apache.org/jira/browse/AIRFLOW-85
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security, ui
>Reporter: Chris Riccomini
>
> Airflow currently provides only an {{/admin}} UI interface for the webapp. 
> This UI provides three distinct roles:
> * Admin
> * Data profiler
> * None
> In addition, Airflow currently provides the ability to log in, either via a 
> secure proxy front-end, or via LDAP/Kerberos, within the webapp.
> We run Airflow with LDAP authentication enabled. This helps us control access 
> to the UI. However, there is insufficient granularity within the UI. We would 
> like to be able to grant users the ability to:
> # View their DAGs, but no one else's.
> # Control their DAGs, but no one else's.
> This is not possible right now. You can take away the ability to access the 
> connections and data profiling tabs, but users can still see all DAGs, as 
> well as control the state of the DB by clearing any DAG status, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


incubator-airflow git commit: Use incubating instead of incubator in title

2016-05-16 Thread bolke
Repository: incubator-airflow
Updated Branches:
  refs/heads/master fb1616a3c -> 72ab63e83


Use incubating instead of incubator in title


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/72ab63e8
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/72ab63e8
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/72ab63e8

Branch: refs/heads/master
Commit: 72ab63e83d68c09a77ede120ea316f5c1b9ff4d0
Parents: fb1616a
Author: Bolke de Bruin 
Authored: Mon May 16 17:52:36 2016 +0200
Committer: Bolke de Bruin 
Committed: Mon May 16 17:52:36 2016 +0200

--
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/72ab63e8/README.md
--
diff --git a/README.md b/README.md
index bb873fd..7587187 100644
--- a/README.md
+++ b/README.md
@@ -119,7 +119,7 @@ Currently **officially** using Airflow:
 ## Links
 
 * [Full documentation on pythonhosted.org](http://pythonhosted.org/airflow/)
-* [Airflow Apache (incubator) (mailing 
list)](http://mail-archives.apache.org/mod_mbox/incubator-airflow-dev/)
+* [Airflow Apache (incubating) (mailing 
list)](http://mail-archives.apache.org/mod_mbox/incubator-airflow-dev/)
 * [Airbnb Blog Post about Airflow](http://nerds.airbnb.com/airflow/)
 * [Airflow Common 
Pitfalls](https://cwiki.apache.org/confluence/display/AIRFLOW/Common+Pitfalls)
 * [Hadoop Summit Airflow Video](https://www.youtube.com/watch?v=oYp49mBwH60)



[jira] [Updated] (AIRFLOW-121) Documenting dag doc_md feature

2016-05-16 Thread dud (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dud updated AIRFLOW-121:

Description: 
Dear Airflow Maintainers,

I added a note about DAG documentation.

I'd be glad if my PR would be merged : 
https://github.com/apache/incubator-airflow/pull/1493

Regards
dud


  was:
Dear Airflow Maintainers,

I added a note about DAG documentation.

I'd be glad if my PR would be merged.

Regards
dud



> Documenting dag doc_md feature
> --
>
> Key: AIRFLOW-121
> URL: https://issues.apache.org/jira/browse/AIRFLOW-121
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: dud
>Priority: Trivial
>
> Dear Airflow Maintainers,
> I added a note about DAG documentation.
> I'd be glad if my PR would be merged : 
> https://github.com/apache/incubator-airflow/pull/1493
> Regards
> dud



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-117) Fix links in README.md

2016-05-16 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284703#comment-15284703
 ] 

Chris Riccomini commented on AIRFLOW-117:
-

Closing due to the above commit.

> Fix links in README.md
> --
>
> Key: AIRFLOW-117
> URL: https://issues.apache.org/jira/browse/AIRFLOW-117
> Project: Apache Airflow
>  Issue Type: Task
>Reporter: Maxime Beauchemin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (AIRFLOW-117) Fix links in README.md

2016-05-16 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini closed AIRFLOW-117.
---
Resolution: Fixed

> Fix links in README.md
> --
>
> Key: AIRFLOW-117
> URL: https://issues.apache.org/jira/browse/AIRFLOW-117
> Project: Apache Airflow
>  Issue Type: Task
>Reporter: Maxime Beauchemin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-116) Surface Airflow Version On Webservers

2016-05-16 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284648#comment-15284648
 ] 

Chris Riccomini commented on AIRFLOW-116:
-

[This|http://stackoverflow.com/questions/2058802/how-can-i-get-the-version-defined-in-setup-py-setuptools-in-my-package]
 looks like a better solution. Uses the {{__version__}} variable in 
{{__init__.py}}.

> Surface Airflow Version On Webservers
> -
>
> Key: AIRFLOW-116
> URL: https://issues.apache.org/jira/browse/AIRFLOW-116
> Project: Apache Airflow
>  Issue Type: Task
>  Components: webserver
>Reporter: Dan Davydov
>Assignee: Siddharth Anand
>Priority: Minor
>
> Surface Airflow Version On Webservers
> Why?
> Figuring out what version webservers are running requires sshing to the 
> webservers which isn't very sane (and not everyone has permissions to do 
> this).
> Success:
> Surface the current airflow version in the webserver (bonus points if the git 
> sha of the airflow code is shown too although this could be hacky), either on 
> every page as a static element or on a dedicated settings page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-120) "Template Not Found" error from QuboleOperator

2016-05-16 Thread Sumit Maheshwari (JIRA)
Sumit Maheshwari created AIRFLOW-120:


 Summary: "Template Not Found" error from QuboleOperator
 Key: AIRFLOW-120
 URL: https://issues.apache.org/jira/browse/AIRFLOW-120
 Project: Apache Airflow
  Issue Type: Bug
  Components: operators
Affects Versions: Airflow 1.7.0
Reporter: Sumit Maheshwari


>From the given example set, when I am using qbol operator for a hive workload, 
>whose script resides in s3 and ends with ".qbl", I am getting "Template Not 
>Found" error. 

Also it would be nice, if airflow always tags commands going from airflow to 
qds. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)