[jira] [Commented] (SPARK-3398) Have spark-ec2 intelligently wait for specific cluster states

2014-11-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14205626#comment-14205626
 ] 

Apache Spark commented on SPARK-3398:
-

User 'nchammas' has created a pull request for this issue:
https://github.com/apache/spark/pull/3195

 Have spark-ec2 intelligently wait for specific cluster states
 -

 Key: SPARK-3398
 URL: https://issues.apache.org/jira/browse/SPARK-3398
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
Priority: Minor
 Fix For: 1.2.0


 {{spark-ec2}} currently has retry logic for when it tries to install stuff on 
 a cluster and for when it tries to destroy security groups. 
 It would be better to have some logic that allows {{spark-ec2}} to explicitly 
 wait for when all the nodes in a cluster it is working on have reached a 
 specific state.
 Examples:
 * Wait for all nodes to be up
 * Wait for all nodes to be up and accepting SSH connections (then start 
 installing stuff)
 * Wait for all nodes to be down
 * Wait for all nodes to be terminated (then delete the security groups)
 Having a function in the {{spark_ec2.py}} script that blocks until the 
 desired cluster state is reached would reduce the need for various retry 
 logic. It would probably also eliminate the need for the {{--wait}} parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3398) Have spark-ec2 intelligently wait for specific cluster states

2014-10-29 Thread Michael Griffiths (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188357#comment-14188357
 ] 

Michael Griffiths commented on SPARK-3398:
--

Hi Nicholas,

Thanks for the thorough investigation!

Making the path absolute does work for me, when called with spark-ec2.

 Have spark-ec2 intelligently wait for specific cluster states
 -

 Key: SPARK-3398
 URL: https://issues.apache.org/jira/browse/SPARK-3398
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
Priority: Minor
 Fix For: 1.2.0


 {{spark-ec2}} currently has retry logic for when it tries to install stuff on 
 a cluster and for when it tries to destroy security groups. 
 It would be better to have some logic that allows {{spark-ec2}} to explicitly 
 wait for when all the nodes in a cluster it is working on have reached a 
 specific state.
 Examples:
 * Wait for all nodes to be up
 * Wait for all nodes to be up and accepting SSH connections (then start 
 installing stuff)
 * Wait for all nodes to be down
 * Wait for all nodes to be terminated (then delete the security groups)
 Having a function in the {{spark_ec2.py}} script that blocks until the 
 desired cluster state is reached would reduce the need for various retry 
 logic. It would probably also eliminate the need for the {{--wait}} parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3398) Have spark-ec2 intelligently wait for specific cluster states

2014-10-29 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188938#comment-14188938
 ] 

Nicholas Chammas commented on SPARK-3398:
-

No problem. I've opened [SPARK-4137] to track this issue, and [PR 
2988|https://github.com/apache/spark/pull/2988] to resolve it.

 Have spark-ec2 intelligently wait for specific cluster states
 -

 Key: SPARK-3398
 URL: https://issues.apache.org/jira/browse/SPARK-3398
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
Priority: Minor
 Fix For: 1.2.0


 {{spark-ec2}} currently has retry logic for when it tries to install stuff on 
 a cluster and for when it tries to destroy security groups. 
 It would be better to have some logic that allows {{spark-ec2}} to explicitly 
 wait for when all the nodes in a cluster it is working on have reached a 
 specific state.
 Examples:
 * Wait for all nodes to be up
 * Wait for all nodes to be up and accepting SSH connections (then start 
 installing stuff)
 * Wait for all nodes to be down
 * Wait for all nodes to be terminated (then delete the security groups)
 Having a function in the {{spark_ec2.py}} script that blocks until the 
 desired cluster state is reached would reduce the need for various retry 
 logic. It would probably also eliminate the need for the {{--wait}} parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3398) Have spark-ec2 intelligently wait for specific cluster states

2014-10-28 Thread Michael Griffiths (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187471#comment-14187471
 ] 

Michael Griffiths commented on SPARK-3398:
--

I'm running into an issue with {{wait_for_cluster_state}} - specifically, 
waiting {{for ssh-ready}}.

AFAICT the [valid states in boto 
are|http://boto.readthedocs.org/en/latest/ref/ec2.html#boto.ec2.instance.InstanceState]:

* pending
* running
* shutting-down
* terminated
* stopping
* stopped

When I invoke spark_ec2.py, it never moves to the next stage (infinite loop).

Is {{ssh-ready}} a state in a different version of boto? 

Thanks,
Michael

 Have spark-ec2 intelligently wait for specific cluster states
 -

 Key: SPARK-3398
 URL: https://issues.apache.org/jira/browse/SPARK-3398
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
Priority: Minor
 Fix For: 1.2.0


 {{spark-ec2}} currently has retry logic for when it tries to install stuff on 
 a cluster and for when it tries to destroy security groups. 
 It would be better to have some logic that allows {{spark-ec2}} to explicitly 
 wait for when all the nodes in a cluster it is working on have reached a 
 specific state.
 Examples:
 * Wait for all nodes to be up
 * Wait for all nodes to be up and accepting SSH connections (then start 
 installing stuff)
 * Wait for all nodes to be down
 * Wait for all nodes to be terminated (then delete the security groups)
 Having a function in the {{spark_ec2.py}} script that blocks until the 
 desired cluster state is reached would reduce the need for various retry 
 logic. It would probably also eliminate the need for the {{--wait}} parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3398) Have spark-ec2 intelligently wait for specific cluster states

2014-10-28 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187519#comment-14187519
 ] 

Nicholas Chammas commented on SPARK-3398:
-

[~michael.griffiths] - 
[{{wait_for_cluster_state}}|https://github.com/apache/spark/blob/4b55482abf899c27da3d55401ad26b4e9247b327/ec2/spark_ec2.py#L634]
 will take any of the valid boto states, plus {{ssh-ready}}. {{ssh-ready}} is 
not a boto state, but rather a handy label for a relevant state that we want to 
wait for. {{spark-ec2}} manually checks for this state by testing SSH 
availability on each of the nodes in the cluster.

How are you invoking {{spark-ec2}}? Sometimes instances can take a few minutes 
before SSH becomes available. How long have you waited?

 Have spark-ec2 intelligently wait for specific cluster states
 -

 Key: SPARK-3398
 URL: https://issues.apache.org/jira/browse/SPARK-3398
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
Priority: Minor
 Fix For: 1.2.0


 {{spark-ec2}} currently has retry logic for when it tries to install stuff on 
 a cluster and for when it tries to destroy security groups. 
 It would be better to have some logic that allows {{spark-ec2}} to explicitly 
 wait for when all the nodes in a cluster it is working on have reached a 
 specific state.
 Examples:
 * Wait for all nodes to be up
 * Wait for all nodes to be up and accepting SSH connections (then start 
 installing stuff)
 * Wait for all nodes to be down
 * Wait for all nodes to be terminated (then delete the security groups)
 Having a function in the {{spark_ec2.py}} script that blocks until the 
 desired cluster state is reached would reduce the need for various retry 
 logic. It would probably also eliminate the need for the {{--wait}} parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3398) Have spark-ec2 intelligently wait for specific cluster states

2014-10-28 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187553#comment-14187553
 ] 

Nicholas Chammas commented on SPARK-3398:
-

Hmm, I'm curious:
# Why did you have to run {{spark-ec2}} again with {{--resume}}?
# Are you using an AMI other than the standard one?
# If yes, do you know what shell that AMI defaults to? What does {{true ; echo 
$?}} return on that shell?

 Have spark-ec2 intelligently wait for specific cluster states
 -

 Key: SPARK-3398
 URL: https://issues.apache.org/jira/browse/SPARK-3398
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
Priority: Minor
 Fix For: 1.2.0


 {{spark-ec2}} currently has retry logic for when it tries to install stuff on 
 a cluster and for when it tries to destroy security groups. 
 It would be better to have some logic that allows {{spark-ec2}} to explicitly 
 wait for when all the nodes in a cluster it is working on have reached a 
 specific state.
 Examples:
 * Wait for all nodes to be up
 * Wait for all nodes to be up and accepting SSH connections (then start 
 installing stuff)
 * Wait for all nodes to be down
 * Wait for all nodes to be terminated (then delete the security groups)
 Having a function in the {{spark_ec2.py}} script that blocks until the 
 desired cluster state is reached would reduce the need for various retry 
 logic. It would probably also eliminate the need for the {{--wait}} parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3398) Have spark-ec2 intelligently wait for specific cluster states

2014-10-28 Thread Michael Griffiths (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187566#comment-14187566
 ] 

Michael Griffiths commented on SPARK-3398:
--

In order - 

 # I tried a few times; it kept failing. Ultimately I ran it once to setup the 
instances, and then waited to ensure I could SSH into the manually before 
running again.

# No, I'm using the default AMI. The only parameters I'm passing are the SSH 
keyname, the key file, and cluster name.

# {{true ; echo $?}} returns 0.

 Have spark-ec2 intelligently wait for specific cluster states
 -

 Key: SPARK-3398
 URL: https://issues.apache.org/jira/browse/SPARK-3398
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
Priority: Minor
 Fix For: 1.2.0


 {{spark-ec2}} currently has retry logic for when it tries to install stuff on 
 a cluster and for when it tries to destroy security groups. 
 It would be better to have some logic that allows {{spark-ec2}} to explicitly 
 wait for when all the nodes in a cluster it is working on have reached a 
 specific state.
 Examples:
 * Wait for all nodes to be up
 * Wait for all nodes to be up and accepting SSH connections (then start 
 installing stuff)
 * Wait for all nodes to be down
 * Wait for all nodes to be terminated (then delete the security groups)
 Having a function in the {{spark_ec2.py}} script that blocks until the 
 desired cluster state is reached would reduce the need for various retry 
 logic. It would probably also eliminate the need for the {{--wait}} parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3398) Have spark-ec2 intelligently wait for specific cluster states

2014-10-28 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187593#comment-14187593
 ] 

Nicholas Chammas commented on SPARK-3398:
-

OK, so you're invoking {{spark-ec2}} from an Ubuntu server. I wonder if that 
matters any, specifically when we make [this 
call|https://github.com/apache/spark/blob/4b55482abf899c27da3d55401ad26b4e9247b327/ec2/spark_ec2.py#L615].

What happens if you replace the code at that line with this version?

{code}
ret = subprocess.check_call(
ssh_command(opts) + ['-t', '-t', '-o', 'ConnectTimeout=3',
 '%s@%s' % (opts.user, host), 
stringify_command('true')]
)
{code}

This will just print SSH's output to the screen instead of suppressing it. If 
anything's going wrong, it should be more obvious that way.

 Have spark-ec2 intelligently wait for specific cluster states
 -

 Key: SPARK-3398
 URL: https://issues.apache.org/jira/browse/SPARK-3398
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
Priority: Minor
 Fix For: 1.2.0


 {{spark-ec2}} currently has retry logic for when it tries to install stuff on 
 a cluster and for when it tries to destroy security groups. 
 It would be better to have some logic that allows {{spark-ec2}} to explicitly 
 wait for when all the nodes in a cluster it is working on have reached a 
 specific state.
 Examples:
 * Wait for all nodes to be up
 * Wait for all nodes to be up and accepting SSH connections (then start 
 installing stuff)
 * Wait for all nodes to be down
 * Wait for all nodes to be terminated (then delete the security groups)
 Having a function in the {{spark_ec2.py}} script that blocks until the 
 desired cluster state is reached would reduce the need for various retry 
 logic. It would probably also eliminate the need for the {{--wait}} parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3398) Have spark-ec2 intelligently wait for specific cluster states

2014-10-28 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187861#comment-14187861
 ] 

Nicholas Chammas commented on SPARK-3398:
-

So I spun up an Ubuntu server on EC2 and was able to reproduce this issue. For 
some reason, the call to SSH in the [referenced 
line|https://github.com/apache/spark/blob/4b55482abf899c27da3d55401ad26b4e9247b327/ec2/spark_ec2.py#L615]
 fails because it can't find the {{pem}} file passed in to {{spark-ec2}}.

Strange. I'm looking into why.

 Have spark-ec2 intelligently wait for specific cluster states
 -

 Key: SPARK-3398
 URL: https://issues.apache.org/jira/browse/SPARK-3398
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
Priority: Minor
 Fix For: 1.2.0


 {{spark-ec2}} currently has retry logic for when it tries to install stuff on 
 a cluster and for when it tries to destroy security groups. 
 It would be better to have some logic that allows {{spark-ec2}} to explicitly 
 wait for when all the nodes in a cluster it is working on have reached a 
 specific state.
 Examples:
 * Wait for all nodes to be up
 * Wait for all nodes to be up and accepting SSH connections (then start 
 installing stuff)
 * Wait for all nodes to be down
 * Wait for all nodes to be terminated (then delete the security groups)
 Having a function in the {{spark_ec2.py}} script that blocks until the 
 desired cluster state is reached would reduce the need for various retry 
 logic. It would probably also eliminate the need for the {{--wait}} parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3398) Have spark-ec2 intelligently wait for specific cluster states

2014-10-28 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187898#comment-14187898
 ] 

Nicholas Chammas commented on SPARK-3398:
-

I think I've found the issue. It doesn't have anything to do with Ubuntu or 
with {{wait_for_cluster_state}}.

[~michael.griffiths] - Did {{spark-ec2 launch --resume}} and {{spark-ec2 
login}} ultimately work for you to the point where you had a working Spark EC2 
cluster? Or are you not sure if in the end you were able to get a working 
cluster?

What I'm seeing is that the issue is specifying the path to the SSH Identity 
file relative to the current working directory vs. absolutely.

Do you still see the same issue if you specify the path to the Identity file 
absolutely?

That is:

{code}
# Currently not working
spark-ec2 -i ../my.pem
{code}

{code}
# Should work
spark-ec2 -i ~/my.pem
spark-ec2 -i /home/me/my.pem
{code}

 Have spark-ec2 intelligently wait for specific cluster states
 -

 Key: SPARK-3398
 URL: https://issues.apache.org/jira/browse/SPARK-3398
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
Priority: Minor
 Fix For: 1.2.0


 {{spark-ec2}} currently has retry logic for when it tries to install stuff on 
 a cluster and for when it tries to destroy security groups. 
 It would be better to have some logic that allows {{spark-ec2}} to explicitly 
 wait for when all the nodes in a cluster it is working on have reached a 
 specific state.
 Examples:
 * Wait for all nodes to be up
 * Wait for all nodes to be up and accepting SSH connections (then start 
 installing stuff)
 * Wait for all nodes to be down
 * Wait for all nodes to be terminated (then delete the security groups)
 Having a function in the {{spark_ec2.py}} script that blocks until the 
 desired cluster state is reached would reduce the need for various retry 
 logic. It would probably also eliminate the need for the {{--wait}} parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3398) Have spark-ec2 intelligently wait for specific cluster states

2014-10-28 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187900#comment-14187900
 ] 

Nicholas Chammas commented on SPARK-3398:
-

If that fixes it for you, then I think the solution is simple. We just need to 
set {{cwd}} to the user's current working directory in all our calls to 
[{{subprocess.check_call()}}|https://docs.python.org/2/library/subprocess.html#subprocess.check_call].
 Right now it defaults to the {{spark-ec2}} directory, which will be 
problematic if you call {{spark-ec2}} from another directory.

 Have spark-ec2 intelligently wait for specific cluster states
 -

 Key: SPARK-3398
 URL: https://issues.apache.org/jira/browse/SPARK-3398
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
Priority: Minor
 Fix For: 1.2.0


 {{spark-ec2}} currently has retry logic for when it tries to install stuff on 
 a cluster and for when it tries to destroy security groups. 
 It would be better to have some logic that allows {{spark-ec2}} to explicitly 
 wait for when all the nodes in a cluster it is working on have reached a 
 specific state.
 Examples:
 * Wait for all nodes to be up
 * Wait for all nodes to be up and accepting SSH connections (then start 
 installing stuff)
 * Wait for all nodes to be down
 * Wait for all nodes to be terminated (then delete the security groups)
 Having a function in the {{spark_ec2.py}} script that blocks until the 
 desired cluster state is reached would reduce the need for various retry 
 logic. It would probably also eliminate the need for the {{--wait}} parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3398) Have spark-ec2 intelligently wait for specific cluster states

2014-09-03 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120957#comment-14120957
 ] 

Nicholas Chammas commented on SPARK-3398:
-

Hey [~joshrosen], does this seem like a good thing to work on?

 Have spark-ec2 intelligently wait for specific cluster states
 -

 Key: SPARK-3398
 URL: https://issues.apache.org/jira/browse/SPARK-3398
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Nicholas Chammas
Priority: Minor

 {{spark-ec2}} currently has retry logic for when it tries to install stuff on 
 a cluster and for when it tries to destroy security groups. 
 It would be better to have some logic that allows {{spark-ec2}} to explicitly 
 wait for when all the nodes in a cluster it is working on have reached a 
 specific state.
 Examples:
 * Wait for all nodes to be up
 * Wait for all nodes to be up and accepting SSH connections (then start 
 installing stuff)
 * Wait for all nodes to be down
 * Wait for all nodes to be terminated (then delete the security groups)
 Having a function in the {{spark_ec2.py}} script that blocks until the 
 desired cluster state is reached would reduce the need for various retry 
 logic. It would probably also eliminate the need for the {{--wait}} parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org