Re: Meetup Interest?

2017-10-14 Thread Marc Bollinger
+1 We'd definitely be in. Would love to chat more about K8s/Airflow--Data Eng has been a little twitchy about being the guinea pigs in our org, but the production app is now serving all traffic from it, so we're planning out our strategy. On Fri, Oct 13, 2017 at 1:29 PM, Daniel Imberman

Re: Redshift operation examples

2017-10-14 Thread Veeranagouda Mukkanagoudar
Thanks Andy, I think this is helpful . On Sat, Oct 14, 2017 at 12:22 PM, Andy Hadjigeorgiou wrote: > Hello, > > If you are looking for querying Redshift clusters, PostgresOperators and > PostgresHook is what you are looking for. Here's the docs >

Re: Redshift operation examples

2017-10-14 Thread Andy Hadjigeorgiou
This blog post has a good example of using PostgresHook to query a database - all you'd need to change is the connection info (to however you'd normally access your Redshift cluster to query). Andy --- Software Engineer | Fundera

Re: Redshift operation examples

2017-10-14 Thread Andy Hadjigeorgiou
Hello, If you are looking for querying Redshift clusters, PostgresOperators and PostgresHook is what you are looking for. Here's the docs for both. If you are looking to manage Redshift clusters, right now you'd have to use

Redshift operation examples

2017-10-14 Thread Veeranagouda Mukkanagoudar
I am new to Airflow, Can anyone point me to Redshift/Postgress operator or task implementation examples. -Thanks Veera

Re: spark sql hook with multiple queries

2017-10-14 Thread Boris Tyukin
Hi Fokko, looks like you've fixed the issue that was causing it :) [AIRFLOW-1562] Spark-sql logging contains deadlock This is exactly what I was seeing - the process would just freeze on the second query I guess waiting for the lock on the log file Thanks! On Sat, Oct 14, 2017 at 5:07 AM,

Re: Return results optionally from spark_sql_hook

2017-10-14 Thread Boris
Thanks Fokko, I think it will do it but my concern that in this case my dag will initiate two separate spark sessions and it takes about 20 seconds in our yarn environment to create it. I need to run 600 dags like that every morning. I am thinking now to create a pyspark job that will do insert

Re: Return results optionally from spark_sql_hook

2017-10-14 Thread Driesprong, Fokko
Hi Boris, That sounds like a nice DAG. This is how I would do it: First run the long running query in a spark-sql operator like you have now. Create a python function that builds a SparkSession within Python (using the Spark pyspark api) and fetches the count from the spark partition that you've

Re: Question about skipping, state propagation, and trigger rules.

2017-10-14 Thread Daniel Lamblin [Data Science & Platform Center]
Thanks Alek, this is an interesting alternative approach that would accomplish what in looking for. I've got 24 such data staging tasks in a daily dag, so going from 2 to 4 take per data source is only mildly more work. The sensor was an s3 prefix sensor from

Re: spark sql hook with multiple queries

2017-10-14 Thread Driesprong, Fokko
Hi Boris, Interesting. Multiple queries is supported by the spark-sql operator and this should work using Airflow. Executing SQL from a file: Fokkos-MBP:~ fokkodriesprong$ spark-sql --driver-java-options "-Dlog4j.configuration=file:///tmp/log4j.properties" -f query.sql 1 Time taken: 1.976

Re: Return results optionally from spark_sql_hook

2017-10-14 Thread Driesprong, Fokko
Hi Boris, Thank you for your question and excuse me for the late response, currently I'm on holiday. The solution that you suggest, would not be my preferred choice. Extracting results from a log using a regex is expensive in terms of computational costs, and error prone. My question is, what