Re: Encrypting Airflow Communications

2016-10-27 Thread siddharth anand
This might be relevant as well if you want to learn more about VPC :
https://www.youtube.com/watch?v=HexrVfuIY1k

On Thu, Oct 27, 2016 at 6:39 PM, siddharth anand  wrote:

> Hmn.. it looks like traffic within AWS's VPC is not encrypted.. so using
> TLS between all services is a need.
>
> 
> http://serverfault.com/questions/573115/traffic-in-a-
> aws-virtual-private-cloud
>
> This video discusses the design of an AWS VPC.
> https://www.youtube.com/watch?v=Zd5hsL-JNY4
>
> -s
>
> On Thu, Oct 27, 2016 at 6:20 PM, siddharth anand 
> wrote:
>
>> I haven't looked into it but would welcome a PR if you were to propose
>> one. We use SQL Alchemy for our ORM, so you may want to look at that for 1)
>> above.
>>
>> We (Agari) run in AWS and run all our EC2-based services (e.g. Airflow
>> servers and the DB) within a VPC. I suspect the folks running in GCP have a
>> similar solution but I don't know enough about GCP.
>>
>> However, this won't help folks running Airflow in their own data centers
>> or in other shared environments.
>>
>>
>> -s
>>
>> On Thu, Oct 27, 2016 at 11:40 AM, Brandon White <
>> brandon.wh...@freenome.com> wrote:
>>
>>> From what I see, Airflow communicates with a couple sources:
>>>
>>> 1) SQL Store
>>> 2) Celery Broker
>>>
>>> Does Airflow have any configurations which make it easy to encrypt all of
>>> its communications or do we need to build custom solutions into Airflow?
>>>
>>> --
>>> This e-mail is private and confidential and is for the addressee only. If
>>> misdirected, please notify us by telephone, confirming that it has been
>>> deleted from your system and any hard copies destroyed. You are strictly
>>> prohibited from using, printing, distributing or disseminating it or any
>>> information contained in it save to the intended recipient.
>>>
>>
>>
>


Re: Encrypting Airflow Communications

2016-10-27 Thread siddharth anand
I haven't looked into it but would welcome a PR if you were to propose one.
We use SQL Alchemy for our ORM, so you may want to look at that for 1)
above.

We (Agari) run in AWS and run all our EC2-based services (e.g. Airflow
servers and the DB) within a VPC. I suspect the folks running in GCP have a
similar solution but I don't know enough about GCP.

However, this won't help folks running Airflow in their own data centers or
in other shared environments.


-s

On Thu, Oct 27, 2016 at 11:40 AM, Brandon White 
wrote:

> From what I see, Airflow communicates with a couple sources:
>
> 1) SQL Store
> 2) Celery Broker
>
> Does Airflow have any configurations which make it easy to encrypt all of
> its communications or do we need to build custom solutions into Airflow?
>
> --
> This e-mail is private and confidential and is for the addressee only. If
> misdirected, please notify us by telephone, confirming that it has been
> deleted from your system and any hard copies destroyed. You are strictly
> prohibited from using, printing, distributing or disseminating it or any
> information contained in it save to the intended recipient.
>


Encrypting Airflow Communications

2016-10-27 Thread Brandon White
>From what I see, Airflow communicates with a couple sources:

1) SQL Store
2) Celery Broker

Does Airflow have any configurations which make it easy to encrypt all of
its communications or do we need to build custom solutions into Airflow?

-- 
This e-mail is private and confidential and is for the addressee only. If 
misdirected, please notify us by telephone, confirming that it has been 
deleted from your system and any hard copies destroyed. You are strictly 
prohibited from using, printing, distributing or disseminating it or any 
information contained in it save to the intended recipient.


Re: Next Release?

2016-10-27 Thread siddharth anand
Resolving PRs (i.e. cold case or recently submitted) & releasing new
versions in a timely manner are both part meeting community expectations.
In this case, they are not tied to each other - the Cold Case PR clean-up
activity is a call-to-action.

We can all run the RC once it is cut - we are waiting for some guidance
from Max right now.
-s

On Thu, Oct 27, 2016 at 9:53 AM, Alex Van Boxel  wrote:

> I thought that the 15 November deadline for PR was in preparation for the
> 1.8 release. Do you need help with the release? I'm dedicating each week
> some time on Airflow anyway (although it's more writing operators :-).
>
> On Thu, Oct 27, 2016 at 6:22 PM siddharth anand  wrote:
>
> > I believe the release manager is/was Max, though it might be Dan (@aoen).
> >
> > -s
> >
> > On Thu, Oct 27, 2016 at 9:20 AM, Chris Riccomini 
> wrote:
> >
> > > No news AFAIK. I think
> > >
> > > 1) someone needs to be release mgr
> > > 2) release mgr needs to cut RC
> > > 3) we all need to deploy RC
> > >
> > > On Thu, Oct 27, 2016 at 1:19 AM, siddharth anand 
> > > wrote:
> > >
> > > > Max, Chris, Bolke?
> > > >
> > > > Any news on the 1.8 release?
> > > > -s
> > > >
> > >
> >
>


Re: Next Release?

2016-10-27 Thread Alex Van Boxel
I thought that the 15 November deadline for PR was in preparation for the
1.8 release. Do you need help with the release? I'm dedicating each week
some time on Airflow anyway (although it's more writing operators :-).

On Thu, Oct 27, 2016 at 6:22 PM siddharth anand  wrote:

> I believe the release manager is/was Max, though it might be Dan (@aoen).
>
> -s
>
> On Thu, Oct 27, 2016 at 9:20 AM, Chris Riccomini  wrote:
>
> > No news AFAIK. I think
> >
> > 1) someone needs to be release mgr
> > 2) release mgr needs to cut RC
> > 3) we all need to deploy RC
> >
> > On Thu, Oct 27, 2016 at 1:19 AM, siddharth anand 
> > wrote:
> >
> > > Max, Chris, Bolke?
> > >
> > > Any news on the 1.8 release?
> > > -s
> > >
> >
>


Re: Next Release?

2016-10-27 Thread siddharth anand
I believe the release manager is/was Max, though it might be Dan (@aoen).

-s

On Thu, Oct 27, 2016 at 9:20 AM, Chris Riccomini  wrote:

> No news AFAIK. I think
>
> 1) someone needs to be release mgr
> 2) release mgr needs to cut RC
> 3) we all need to deploy RC
>
> On Thu, Oct 27, 2016 at 1:19 AM, siddharth anand 
> wrote:
>
> > Max, Chris, Bolke?
> >
> > Any news on the 1.8 release?
> > -s
> >
>


Re: Regarding hive server2

2016-10-27 Thread Maxime Beauchemin
Have you tried `PLAIN`? The nest approach is probably to fire up an iPython
shell and interact with the underlying library until you find a working
path. Then you can read the fairly simple HiveServer2Hook code and figure
out how to pass your configuration settings through.

Max

On Thu, Oct 27, 2016 at 2:01 AM, twinkle sachdeva <
twinkle.sachd...@gmail.com> wrote:

> Hi Maxime,
>
> Before this setting, i was getting following exception:
>
>  File "/home/xxx/.pyenv/versions/2.7.12/lib/python2.7/site-
> packages/pyhs2/cloudera/thrift_sasl.py", line 66, in open
>
> message=("Could not start SASL: %s" % self.sasl.getError()))
>
> thrift.transport.TTransport.TTransportException: Could not start SASL:
> Error in sasl_client_start (-4) SASL(-4): no mechanism available: No worthy
> mechs found
>
>
> After using the NOSASL setting, I am getting following exception:
>
> File "/home/xxx/.pyenv/versions/2.7.12/lib/python2.7/site-
> packages/pyhs2/TCLIService/TCLIService.py", line 154, in OpenSession
>
> return self.recv_OpenSession()
>
>   File "/home/xxx/.pyenv/versions/2.7.12/lib/python2.7/site-
> packages/pyhs2/TCLIService/TCLIService.py", line 165, in recv_OpenSession
>
> (fname, mtype, rseqid) = self._iprot.readMessageBegin()
>
>   File "/home/xxx/.pyenv/versions/2.7.12/lib/python2.7/site-
> packages/thrift/protocol/TBinaryProtocol.py", line 140, in
> readMessageBegin
>
> name = self.trans.readAll(sz)
>
>   File "/home/xxx/.pyenv/versions/2.7.12/lib/python2.7/site-
> packages/thrift/transport/TTransport.py", line 58, in readAll
>
> chunk = self.read(sz - have)
>
>   File "/home/xxx/.pyenv/versions/2.7.12/lib/python2.7/site-
> packages/thrift/transport/TTransport.py", line 159, in read
>
> self.__rbuf = StringIO(self.__trans.read(max(sz, self.__rbuf_size)))
>
>   File "/home/xxx/.pyenv/versions/2.7.12/lib/python2.7/site-
> packages/thrift/transport/TSocket.py", line 118, in read
>
> message='TSocket read 0 bytes')
>
> thrift.transport.TTransport.TTransportException: TSocket read 0 bytes.
>
>
> I did checked, if the hive server is running or not.
>
> I am getting the same issue, if i try to connect using simple python
> program also.
>
> On Wed, Oct 26, 2016 at 9:17 PM, Maxime Beauchemin <
> maximebeauche...@gmail.com> wrote:
>
> > From memory, I think this is related to having the wrong authentication
> > method.
> > https://github.com/apache/incubator-airflow/blob/master/
> > airflow/hooks/hive_hooks.py#L578
> >
> > You may want to try NOSASL. To do that i think you have to put something
> > like `{ "authMechanism": "NOSASL" }` in your Connection's extra params.
> >
> > On Wed, Oct 26, 2016 at 1:50 AM, twinkle sachdeva <
> > twinkle.sachd...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I am trying to use HiveToMySqlTransfer operator, but I am not able to
> > read
> > > any data with the following configuration:
> > >
> > > TSocket.py", line 120, in read
> > >
> > > message='TSocket read 0 bytes')
> > >
> > > thrift.transport.TTransport.TTransportException: TSocket read 0 bytes
> > >
> > > It seems to happen due to some mismatch in thrift protocol etc
> > > specification.
> > >
> > > Please help me on what can be done.
> > >
> > >
> > > Regards,
> > >
> > > Twinkle
> > >
> >
>


retry handler not getting called

2016-10-27 Thread twinkle sachdeva
Hi,

I am working with a hiveToMySql transfer operation, which is not able to
connect and gives following logs:

HiveServer2Error: Failed after retrying 3 times

[2016-10-27 01:54:46,524] {models.py:1298} INFO - Marking task as
UP_FOR_RETRY


As per the code in the models.py,
https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L1347
, it should be calling the code for retry handler ( we are using version
1.7.1.3), but it is not happening.


Here is the relevant code snippet:


def my_retry_handler(context):

 print "my_retry_handler called" slack_retry_notification =
SlackAPIPostOperator( task_id='Slack_Failure_Notification',

 token="XX",

 channel='@yy',

 text=":skull: - {time} - {dag} attempt has been failed
".format(dag='Some Dag',time=datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
),

 owner='Retry Handler')

 return slack_retry_notification.execute


hiveToMySql = HiveToMYSQLTransfer(sql="jdbsd", hiveserver2_conn_id='
hiveserver2_default',.

on_retry_callback=my_retry_handler)


Is there something i am missing?



Regards,

Twinkle


Re: Regarding hive server2

2016-10-27 Thread twinkle sachdeva
Hi Maxime,

Before this setting, i was getting following exception:

 File "/home/xxx/.pyenv/versions/2.7.12/lib/python2.7/site-
packages/pyhs2/cloudera/thrift_sasl.py", line 66, in open

message=("Could not start SASL: %s" % self.sasl.getError()))

thrift.transport.TTransport.TTransportException: Could not start SASL:
Error in sasl_client_start (-4) SASL(-4): no mechanism available: No worthy
mechs found


After using the NOSASL setting, I am getting following exception:

File "/home/xxx/.pyenv/versions/2.7.12/lib/python2.7/site-
packages/pyhs2/TCLIService/TCLIService.py", line 154, in OpenSession

return self.recv_OpenSession()

  File "/home/xxx/.pyenv/versions/2.7.12/lib/python2.7/site-
packages/pyhs2/TCLIService/TCLIService.py", line 165, in recv_OpenSession

(fname, mtype, rseqid) = self._iprot.readMessageBegin()

  File "/home/xxx/.pyenv/versions/2.7.12/lib/python2.7/site-
packages/thrift/protocol/TBinaryProtocol.py", line 140, in readMessageBegin

name = self.trans.readAll(sz)

  File "/home/xxx/.pyenv/versions/2.7.12/lib/python2.7/site-
packages/thrift/transport/TTransport.py", line 58, in readAll

chunk = self.read(sz - have)

  File "/home/xxx/.pyenv/versions/2.7.12/lib/python2.7/site-
packages/thrift/transport/TTransport.py", line 159, in read

self.__rbuf = StringIO(self.__trans.read(max(sz, self.__rbuf_size)))

  File "/home/xxx/.pyenv/versions/2.7.12/lib/python2.7/site-
packages/thrift/transport/TSocket.py", line 118, in read

message='TSocket read 0 bytes')

thrift.transport.TTransport.TTransportException: TSocket read 0 bytes.


I did checked, if the hive server is running or not.

I am getting the same issue, if i try to connect using simple python
program also.

On Wed, Oct 26, 2016 at 9:17 PM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> From memory, I think this is related to having the wrong authentication
> method.
> https://github.com/apache/incubator-airflow/blob/master/
> airflow/hooks/hive_hooks.py#L578
>
> You may want to try NOSASL. To do that i think you have to put something
> like `{ "authMechanism": "NOSASL" }` in your Connection's extra params.
>
> On Wed, Oct 26, 2016 at 1:50 AM, twinkle sachdeva <
> twinkle.sachd...@gmail.com> wrote:
>
> > Hi,
> >
> > I am trying to use HiveToMySqlTransfer operator, but I am not able to
> read
> > any data with the following configuration:
> >
> > TSocket.py", line 120, in read
> >
> > message='TSocket read 0 bytes')
> >
> > thrift.transport.TTransport.TTransportException: TSocket read 0 bytes
> >
> > It seems to happen due to some mismatch in thrift protocol etc
> > specification.
> >
> > Please help me on what can be done.
> >
> >
> > Regards,
> >
> > Twinkle
> >
>


Next Release?

2016-10-27 Thread siddharth anand
Max, Chris, Bolke?

Any news on the 1.8 release?
-s


Re: Call to Committers : PR Clean up Duty : ETA Nov 15

2016-10-27 Thread siddharth anand
We are down to 71 open PRs from 110 a couple of weeks ago.

Chris, Bolke, Jeremiah, Dan, Max? Can we knock that number down to 30?

[image: Inline image 1]

On Wed, Oct 26, 2016 at 12:58 AM, siddharth anand  wrote:

> Pretty good activity on Cold Case PR clean-up. We've clean up about 30,
> but will have at least 50 cold case PRs to resolve.
>
>
> All the empty, red, or brown items need to be resolved however. @artwr
> (arthur), @mistercrunch (max), @zodian, and @jlowin, can you aim to resolve
> your target PRs this week? Also, if others have time, please pitch in.
>
> -s[image: Inline image 1]
>
> On Mon, Oct 17, 2016 at 10:33 AM, siddharth anand 
> wrote:
>
>> Arthur's picked up 12 (Yaay)! Please work with him to resolve your PRs.
>>
>> Steven, Dan (Aoen), Chris, Bolke, Patrick? Any chance you can take on a
>> few?
>>
>>
>> [image: Inline image 1]
>>
>> On Sun, Oct 16, 2016 at 2:53 PM, siddharth anand 
>> wrote:
>>
>>> I've closed around 13... less than half of which were merged - the
>>> remainder were closed after 4-5 days of not hearing from submitters. If
>>> resubmitted, I'd be happy to take a look.
>>>
>>> https://cwiki.apache.org/confluence/display/AIRFLOW/Cold-Cas
>>> e+PR+Resolution
>>>
>>> I noticed the msumit & mistercrunch reached out to submitters for their
>>> cold-case PRs. Please work with them. For any PRs closed without merging,
>>> please update the JIRA to be unassigned and to no longer point to the dead
>>> PR. Please review the link above.
>>>
>>> JIRAs with "squatters" is an anti-pattern that I have also been guilty
>>> of. It's best to take on JIRAs that we have a reasonable chance of
>>> delivering in a few weeks.
>>>
>>>
>>> -s
>>>
>>> On Wed, Oct 12, 2016 at 1:09 PM, Ben Tallman  wrote:
>>>
 Sid -

 Thanks for staying on top of this. One of the most important things
 when we
 looked at Airflow vs Others was the health of the community (OK, a lack
 of
 valid competition was also important).

 When the community makes an effort to contribute, it requires PRs to be
 moderated and handled. To that end, staying on top of PRs is a huge
 commitment, as well as a sign of health.

 Ben


 Thanks,
 Ben

 *--*
 *ben tallman* | *apigee
 *
  | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage
 
  @apigee
 
 

 On Wed, Oct 12, 2016 at 12:38 PM, siddharth anand 
 wrote:

 > Max,
 > Thanks for adding to the list.
 > https://cwiki.apache.org/confluence/display/AIRFLOW/
 > Cold-Case+PR+Resolution
 >
 >
 > If you're a committer on Airflow, please read closely.
 >
 > I would like us to commit to reviewing PRs within 2 weeks. 90% of our
 PRs
 > were older than that as per an earlier email. That's resulting in an
 bad
 > experience for our community and contributors. Before we can make that
 > commitment, we need to clean up what are mostly abandoned PRs.
 >
 > Please do the following at your earliest:
 >
 >- Pick 10 PRs opened before Oct 2
 >- Review it. If the PR is ready to merge, please test and merge it
 >- In most cases, the PRs require some action from the submitters
 >   - Comment on the PR asking the submitter to update the PR
 >   - If the submitter does not respond within a week, you can
 close the
 >   PR with comments such as "PR abandoned by submitter" or "no
 movement
 >   from submitter"
 >   - If the submitter responds and keeps the PR alive, please work
 with
 >   them
 >- If you are a contributor, please work with the committers to
 bring
 >your PRs to a positive outcome
 >
 > We currently have <100 open PRs now and are many of you have already
 > started working on this.
 >
 > Bolke, Chris, Dan (aeon), Patrick, Steven : Please update the wiki
 above at
 > your earliest