[jira] [Commented] (SPARK-6246) spark-ec2 can't handle clusters with 100 nodes

2015-05-19 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551486#comment-14551486
 ] 

Shivaram Venkataraman commented on SPARK-6246:
--

[~srowen] Could you add [~alyaxey] to the developers group and assign this 
issue ?

 spark-ec2 can't handle clusters with  100 nodes
 

 Key: SPARK-6246
 URL: https://issues.apache.org/jira/browse/SPARK-6246
 Project: Spark
  Issue Type: Bug
  Components: EC2
Affects Versions: 1.3.0
Reporter: Nicholas Chammas
Priority: Minor
 Fix For: 1.5.0


 This appears to be a new restriction, perhaps resulting from our upgrade of 
 boto. Maybe it's a new restriction from EC2. Not sure yet.
 We didn't have this issue around the Spark 1.1.0 time frame from what I can 
 remember. I'll track down where the issue is and when it started.
 Attempting to launch a cluster with 100 slaves yields the following:
 {code}
 Spark AMI: ami-35b1885c
 Launching instances...
 Launched 100 slaves in us-east-1c, regid = r-9c408776
 Launched master in us-east-1c, regid = r-92408778
 Waiting for AWS to propagate instance metadata...
 Waiting for cluster to enter 'ssh-ready' state.ERROR:boto:400 Bad Request
 ERROR:boto:?xml version=1.0 encoding=UTF-8?
 ResponseErrorsErrorCodeInvalidRequest/CodeMessage101 exceeds the 
 maximum number of instance IDs that can be specificied (100). Please specify 
 fewer than 100 instance 
 IDs./Message/Error/ErrorsRequestID217fd6ff-9afa-4e91-86bc-ab16fcc442d8/RequestID/Response
 Traceback (most recent call last):
   File ./ec2/spark_ec2.py, line 1338, in module
 main()
   File ./ec2/spark_ec2.py, line 1330, in main
 real_main()
   File ./ec2/spark_ec2.py, line 1170, in real_main
 cluster_state='ssh-ready'
   File ./ec2/spark_ec2.py, line 795, in wait_for_cluster_state
 statuses = conn.get_all_instance_status(instance_ids=[i.id for i in 
 cluster_instances])
   File /path/apache/spark/ec2/lib/boto-2.34.0/boto/ec2/connection.py, line 
 737, in get_all_instance_status
 InstanceStatusSet, verb='POST')
   File /path/apache/spark/ec2/lib/boto-2.34.0/boto/connection.py, line 
 1204, in get_object
 raise self.ResponseError(response.status, response.reason, body)
 boto.exception.EC2ResponseError: EC2ResponseError: 400 Bad Request
 ?xml version=1.0 encoding=UTF-8?
 ResponseErrorsErrorCodeInvalidRequest/CodeMessage101 exceeds the 
 maximum number of instance IDs that can be specificied (100). Please specify 
 fewer than 100 instance 
 IDs./Message/Error/ErrorsRequestID217fd6ff-9afa-4e91-86bc-ab16fcc442d8/RequestID/Response
 {code}
 This problem seems to be with {{get_all_instance_status()}}, though I am not 
 sure if other methods are affected too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6246) spark-ec2 can't handle clusters with 100 nodes

2015-05-19 Thread Alex (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550637#comment-14550637
 ] 

Alex commented on SPARK-6246:
-

This can be fixed by replacing the line in file ec2/spark_ec2.py

statuses = conn.get_all_instance_status(instance_ids=[i.id for i in 
cluster_instances])

with the lines:

max_batch = 100
statuses = []
for j in range((len(cluster_instances) + max_batch - 1) // max_batch):
statuses.extend(conn.get_all_instance_status(instance_ids=[i.id for 
i in cluster_instances[j * max_batch:(j + 1) * max_batch]]))

 spark-ec2 can't handle clusters with  100 nodes
 

 Key: SPARK-6246
 URL: https://issues.apache.org/jira/browse/SPARK-6246
 Project: Spark
  Issue Type: Bug
  Components: EC2
Affects Versions: 1.3.0
Reporter: Nicholas Chammas
Priority: Minor

 This appears to be a new restriction, perhaps resulting from our upgrade of 
 boto. Maybe it's a new restriction from EC2. Not sure yet.
 We didn't have this issue around the Spark 1.1.0 time frame from what I can 
 remember. I'll track down where the issue is and when it started.
 Attempting to launch a cluster with 100 slaves yields the following:
 {code}
 Spark AMI: ami-35b1885c
 Launching instances...
 Launched 100 slaves in us-east-1c, regid = r-9c408776
 Launched master in us-east-1c, regid = r-92408778
 Waiting for AWS to propagate instance metadata...
 Waiting for cluster to enter 'ssh-ready' state.ERROR:boto:400 Bad Request
 ERROR:boto:?xml version=1.0 encoding=UTF-8?
 ResponseErrorsErrorCodeInvalidRequest/CodeMessage101 exceeds the 
 maximum number of instance IDs that can be specificied (100). Please specify 
 fewer than 100 instance 
 IDs./Message/Error/ErrorsRequestID217fd6ff-9afa-4e91-86bc-ab16fcc442d8/RequestID/Response
 Traceback (most recent call last):
   File ./ec2/spark_ec2.py, line 1338, in module
 main()
   File ./ec2/spark_ec2.py, line 1330, in main
 real_main()
   File ./ec2/spark_ec2.py, line 1170, in real_main
 cluster_state='ssh-ready'
   File ./ec2/spark_ec2.py, line 795, in wait_for_cluster_state
 statuses = conn.get_all_instance_status(instance_ids=[i.id for i in 
 cluster_instances])
   File /path/apache/spark/ec2/lib/boto-2.34.0/boto/ec2/connection.py, line 
 737, in get_all_instance_status
 InstanceStatusSet, verb='POST')
   File /path/apache/spark/ec2/lib/boto-2.34.0/boto/connection.py, line 
 1204, in get_object
 raise self.ResponseError(response.status, response.reason, body)
 boto.exception.EC2ResponseError: EC2ResponseError: 400 Bad Request
 ?xml version=1.0 encoding=UTF-8?
 ResponseErrorsErrorCodeInvalidRequest/CodeMessage101 exceeds the 
 maximum number of instance IDs that can be specificied (100). Please specify 
 fewer than 100 instance 
 IDs./Message/Error/ErrorsRequestID217fd6ff-9afa-4e91-86bc-ab16fcc442d8/RequestID/Response
 {code}
 This problem seems to be with {{get_all_instance_status()}}, though I am not 
 sure if other methods are affected too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6246) spark-ec2 can't handle clusters with 100 nodes

2015-05-19 Thread Alex (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551157#comment-14551157
 ] 

Alex commented on SPARK-6246:
-

[~shivaram] Done. This is my first PR. Do I have to do anything else to 
contribute to this ticket?

 spark-ec2 can't handle clusters with  100 nodes
 

 Key: SPARK-6246
 URL: https://issues.apache.org/jira/browse/SPARK-6246
 Project: Spark
  Issue Type: Bug
  Components: EC2
Affects Versions: 1.3.0
Reporter: Nicholas Chammas
Priority: Minor

 This appears to be a new restriction, perhaps resulting from our upgrade of 
 boto. Maybe it's a new restriction from EC2. Not sure yet.
 We didn't have this issue around the Spark 1.1.0 time frame from what I can 
 remember. I'll track down where the issue is and when it started.
 Attempting to launch a cluster with 100 slaves yields the following:
 {code}
 Spark AMI: ami-35b1885c
 Launching instances...
 Launched 100 slaves in us-east-1c, regid = r-9c408776
 Launched master in us-east-1c, regid = r-92408778
 Waiting for AWS to propagate instance metadata...
 Waiting for cluster to enter 'ssh-ready' state.ERROR:boto:400 Bad Request
 ERROR:boto:?xml version=1.0 encoding=UTF-8?
 ResponseErrorsErrorCodeInvalidRequest/CodeMessage101 exceeds the 
 maximum number of instance IDs that can be specificied (100). Please specify 
 fewer than 100 instance 
 IDs./Message/Error/ErrorsRequestID217fd6ff-9afa-4e91-86bc-ab16fcc442d8/RequestID/Response
 Traceback (most recent call last):
   File ./ec2/spark_ec2.py, line 1338, in module
 main()
   File ./ec2/spark_ec2.py, line 1330, in main
 real_main()
   File ./ec2/spark_ec2.py, line 1170, in real_main
 cluster_state='ssh-ready'
   File ./ec2/spark_ec2.py, line 795, in wait_for_cluster_state
 statuses = conn.get_all_instance_status(instance_ids=[i.id for i in 
 cluster_instances])
   File /path/apache/spark/ec2/lib/boto-2.34.0/boto/ec2/connection.py, line 
 737, in get_all_instance_status
 InstanceStatusSet, verb='POST')
   File /path/apache/spark/ec2/lib/boto-2.34.0/boto/connection.py, line 
 1204, in get_object
 raise self.ResponseError(response.status, response.reason, body)
 boto.exception.EC2ResponseError: EC2ResponseError: 400 Bad Request
 ?xml version=1.0 encoding=UTF-8?
 ResponseErrorsErrorCodeInvalidRequest/CodeMessage101 exceeds the 
 maximum number of instance IDs that can be specificied (100). Please specify 
 fewer than 100 instance 
 IDs./Message/Error/ErrorsRequestID217fd6ff-9afa-4e91-86bc-ab16fcc442d8/RequestID/Response
 {code}
 This problem seems to be with {{get_all_instance_status()}}, though I am not 
 sure if other methods are affected too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6246) spark-ec2 can't handle clusters with 100 nodes

2015-05-19 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551150#comment-14551150
 ] 

Apache Spark commented on SPARK-6246:
-

User 'alyaxey' has created a pull request for this issue:
https://github.com/apache/spark/pull/6267

 spark-ec2 can't handle clusters with  100 nodes
 

 Key: SPARK-6246
 URL: https://issues.apache.org/jira/browse/SPARK-6246
 Project: Spark
  Issue Type: Bug
  Components: EC2
Affects Versions: 1.3.0
Reporter: Nicholas Chammas
Priority: Minor

 This appears to be a new restriction, perhaps resulting from our upgrade of 
 boto. Maybe it's a new restriction from EC2. Not sure yet.
 We didn't have this issue around the Spark 1.1.0 time frame from what I can 
 remember. I'll track down where the issue is and when it started.
 Attempting to launch a cluster with 100 slaves yields the following:
 {code}
 Spark AMI: ami-35b1885c
 Launching instances...
 Launched 100 slaves in us-east-1c, regid = r-9c408776
 Launched master in us-east-1c, regid = r-92408778
 Waiting for AWS to propagate instance metadata...
 Waiting for cluster to enter 'ssh-ready' state.ERROR:boto:400 Bad Request
 ERROR:boto:?xml version=1.0 encoding=UTF-8?
 ResponseErrorsErrorCodeInvalidRequest/CodeMessage101 exceeds the 
 maximum number of instance IDs that can be specificied (100). Please specify 
 fewer than 100 instance 
 IDs./Message/Error/ErrorsRequestID217fd6ff-9afa-4e91-86bc-ab16fcc442d8/RequestID/Response
 Traceback (most recent call last):
   File ./ec2/spark_ec2.py, line 1338, in module
 main()
   File ./ec2/spark_ec2.py, line 1330, in main
 real_main()
   File ./ec2/spark_ec2.py, line 1170, in real_main
 cluster_state='ssh-ready'
   File ./ec2/spark_ec2.py, line 795, in wait_for_cluster_state
 statuses = conn.get_all_instance_status(instance_ids=[i.id for i in 
 cluster_instances])
   File /path/apache/spark/ec2/lib/boto-2.34.0/boto/ec2/connection.py, line 
 737, in get_all_instance_status
 InstanceStatusSet, verb='POST')
   File /path/apache/spark/ec2/lib/boto-2.34.0/boto/connection.py, line 
 1204, in get_object
 raise self.ResponseError(response.status, response.reason, body)
 boto.exception.EC2ResponseError: EC2ResponseError: 400 Bad Request
 ?xml version=1.0 encoding=UTF-8?
 ResponseErrorsErrorCodeInvalidRequest/CodeMessage101 exceeds the 
 maximum number of instance IDs that can be specificied (100). Please specify 
 fewer than 100 instance 
 IDs./Message/Error/ErrorsRequestID217fd6ff-9afa-4e91-86bc-ab16fcc442d8/RequestID/Response
 {code}
 This problem seems to be with {{get_all_instance_status()}}, though I am not 
 sure if other methods are affected too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6246) spark-ec2 can't handle clusters with 100 nodes

2015-05-19 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550652#comment-14550652
 ] 

Shivaram Venkataraman commented on SPARK-6246:
--

Could you send a PR for this ?

 spark-ec2 can't handle clusters with  100 nodes
 

 Key: SPARK-6246
 URL: https://issues.apache.org/jira/browse/SPARK-6246
 Project: Spark
  Issue Type: Bug
  Components: EC2
Affects Versions: 1.3.0
Reporter: Nicholas Chammas
Priority: Minor

 This appears to be a new restriction, perhaps resulting from our upgrade of 
 boto. Maybe it's a new restriction from EC2. Not sure yet.
 We didn't have this issue around the Spark 1.1.0 time frame from what I can 
 remember. I'll track down where the issue is and when it started.
 Attempting to launch a cluster with 100 slaves yields the following:
 {code}
 Spark AMI: ami-35b1885c
 Launching instances...
 Launched 100 slaves in us-east-1c, regid = r-9c408776
 Launched master in us-east-1c, regid = r-92408778
 Waiting for AWS to propagate instance metadata...
 Waiting for cluster to enter 'ssh-ready' state.ERROR:boto:400 Bad Request
 ERROR:boto:?xml version=1.0 encoding=UTF-8?
 ResponseErrorsErrorCodeInvalidRequest/CodeMessage101 exceeds the 
 maximum number of instance IDs that can be specificied (100). Please specify 
 fewer than 100 instance 
 IDs./Message/Error/ErrorsRequestID217fd6ff-9afa-4e91-86bc-ab16fcc442d8/RequestID/Response
 Traceback (most recent call last):
   File ./ec2/spark_ec2.py, line 1338, in module
 main()
   File ./ec2/spark_ec2.py, line 1330, in main
 real_main()
   File ./ec2/spark_ec2.py, line 1170, in real_main
 cluster_state='ssh-ready'
   File ./ec2/spark_ec2.py, line 795, in wait_for_cluster_state
 statuses = conn.get_all_instance_status(instance_ids=[i.id for i in 
 cluster_instances])
   File /path/apache/spark/ec2/lib/boto-2.34.0/boto/ec2/connection.py, line 
 737, in get_all_instance_status
 InstanceStatusSet, verb='POST')
   File /path/apache/spark/ec2/lib/boto-2.34.0/boto/connection.py, line 
 1204, in get_object
 raise self.ResponseError(response.status, response.reason, body)
 boto.exception.EC2ResponseError: EC2ResponseError: 400 Bad Request
 ?xml version=1.0 encoding=UTF-8?
 ResponseErrorsErrorCodeInvalidRequest/CodeMessage101 exceeds the 
 maximum number of instance IDs that can be specificied (100). Please specify 
 fewer than 100 instance 
 IDs./Message/Error/ErrorsRequestID217fd6ff-9afa-4e91-86bc-ab16fcc442d8/RequestID/Response
 {code}
 This problem seems to be with {{get_all_instance_status()}}, though I am not 
 sure if other methods are affected too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6246) spark-ec2 can't handle clusters with 100 nodes

2015-05-17 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547501#comment-14547501
 ] 

Shivaram Venkataraman commented on SPARK-6246:
--

I just ran into this problem as well. This definitely does not happen with some 
of the older versions of the script. 

 spark-ec2 can't handle clusters with  100 nodes
 

 Key: SPARK-6246
 URL: https://issues.apache.org/jira/browse/SPARK-6246
 Project: Spark
  Issue Type: Bug
  Components: EC2
Affects Versions: 1.3.0
Reporter: Nicholas Chammas
Priority: Minor

 This appears to be a new restriction, perhaps resulting from our upgrade of 
 boto. Maybe it's a new restriction from EC2. Not sure yet.
 We didn't have this issue around the Spark 1.1.0 time frame from what I can 
 remember. I'll track down where the issue is and when it started.
 Attempting to launch a cluster with 100 slaves yields the following:
 {code}
 Spark AMI: ami-35b1885c
 Launching instances...
 Launched 100 slaves in us-east-1c, regid = r-9c408776
 Launched master in us-east-1c, regid = r-92408778
 Waiting for AWS to propagate instance metadata...
 Waiting for cluster to enter 'ssh-ready' state.ERROR:boto:400 Bad Request
 ERROR:boto:?xml version=1.0 encoding=UTF-8?
 ResponseErrorsErrorCodeInvalidRequest/CodeMessage101 exceeds the 
 maximum number of instance IDs that can be specificied (100). Please specify 
 fewer than 100 instance 
 IDs./Message/Error/ErrorsRequestID217fd6ff-9afa-4e91-86bc-ab16fcc442d8/RequestID/Response
 Traceback (most recent call last):
   File ./ec2/spark_ec2.py, line 1338, in module
 main()
   File ./ec2/spark_ec2.py, line 1330, in main
 real_main()
   File ./ec2/spark_ec2.py, line 1170, in real_main
 cluster_state='ssh-ready'
   File ./ec2/spark_ec2.py, line 795, in wait_for_cluster_state
 statuses = conn.get_all_instance_status(instance_ids=[i.id for i in 
 cluster_instances])
   File /path/apache/spark/ec2/lib/boto-2.34.0/boto/ec2/connection.py, line 
 737, in get_all_instance_status
 InstanceStatusSet, verb='POST')
   File /path/apache/spark/ec2/lib/boto-2.34.0/boto/connection.py, line 
 1204, in get_object
 raise self.ResponseError(response.status, response.reason, body)
 boto.exception.EC2ResponseError: EC2ResponseError: 400 Bad Request
 ?xml version=1.0 encoding=UTF-8?
 ResponseErrorsErrorCodeInvalidRequest/CodeMessage101 exceeds the 
 maximum number of instance IDs that can be specificied (100). Please specify 
 fewer than 100 instance 
 IDs./Message/Error/ErrorsRequestID217fd6ff-9afa-4e91-86bc-ab16fcc442d8/RequestID/Response
 {code}
 This problem seems to be with {{get_all_instance_status()}}, though I am not 
 sure if other methods are affected too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6246) spark-ec2 can't handle clusters with 100 nodes

2015-03-12 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355403#comment-14355403
 ] 

Shivaram Venkataraman commented on SPARK-6246:
--

Hmm - This seems like a bad problem. And it looks like a AWS side change rather 
than a boto change I guess.
[~nchammas] Similar to the EC2Box issue above, can we also batch calls to 
`get_instances` 100 instances at a time ?

 spark-ec2 can't handle clusters with  100 nodes
 

 Key: SPARK-6246
 URL: https://issues.apache.org/jira/browse/SPARK-6246
 Project: Spark
  Issue Type: Bug
  Components: EC2
Affects Versions: 1.3.0
Reporter: Nicholas Chammas
Priority: Minor

 This appears to be a new restriction, perhaps resulting from our upgrade of 
 boto. Maybe it's a new restriction from EC2. Not sure yet.
 We didn't have this issue around the Spark 1.1.0 time frame from what I can 
 remember. I'll track down where the issue is and when it started.
 Attempting to launch a cluster with 100 slaves yields the following:
 {code}
 Spark AMI: ami-35b1885c
 Launching instances...
 Launched 100 slaves in us-east-1c, regid = r-9c408776
 Launched master in us-east-1c, regid = r-92408778
 Waiting for AWS to propagate instance metadata...
 Waiting for cluster to enter 'ssh-ready' state.ERROR:boto:400 Bad Request
 ERROR:boto:?xml version=1.0 encoding=UTF-8?
 ResponseErrorsErrorCodeInvalidRequest/CodeMessage101 exceeds the 
 maximum number of instance IDs that can be specificied (100). Please specify 
 fewer than 100 instance 
 IDs./Message/Error/ErrorsRequestID217fd6ff-9afa-4e91-86bc-ab16fcc442d8/RequestID/Response
 Traceback (most recent call last):
   File ./ec2/spark_ec2.py, line 1338, in module
 main()
   File ./ec2/spark_ec2.py, line 1330, in main
 real_main()
   File ./ec2/spark_ec2.py, line 1170, in real_main
 cluster_state='ssh-ready'
   File ./ec2/spark_ec2.py, line 795, in wait_for_cluster_state
 statuses = conn.get_all_instance_status(instance_ids=[i.id for i in 
 cluster_instances])
   File /path/apache/spark/ec2/lib/boto-2.34.0/boto/ec2/connection.py, line 
 737, in get_all_instance_status
 InstanceStatusSet, verb='POST')
   File /path/apache/spark/ec2/lib/boto-2.34.0/boto/connection.py, line 
 1204, in get_object
 raise self.ResponseError(response.status, response.reason, body)
 boto.exception.EC2ResponseError: EC2ResponseError: 400 Bad Request
 ?xml version=1.0 encoding=UTF-8?
 ResponseErrorsErrorCodeInvalidRequest/CodeMessage101 exceeds the 
 maximum number of instance IDs that can be specificied (100). Please specify 
 fewer than 100 instance 
 IDs./Message/Error/ErrorsRequestID217fd6ff-9afa-4e91-86bc-ab16fcc442d8/RequestID/Response
 {code}
 This problem seems to be with {{get_all_instance_status()}}, though I am not 
 sure if other methods are affected too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6246) spark-ec2 can't handle clusters with 100 nodes

2015-03-10 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354969#comment-14354969
 ] 

Nicholas Chammas commented on SPARK-6246:
-

FYI [~shivaram].

 spark-ec2 can't handle clusters with  100 nodes
 

 Key: SPARK-6246
 URL: https://issues.apache.org/jira/browse/SPARK-6246
 Project: Spark
  Issue Type: Bug
  Components: EC2
Affects Versions: 1.3.0
Reporter: Nicholas Chammas
Priority: Minor

 This appears to be a new restriction, perhaps resulting from our upgrade of 
 boto. Maybe it's a new restriction from EC2. Not sure yet.
 We didn't have this issue around the Spark 1.1.0 time frame from what I can 
 remember. I'll track down where the issue is and when it started.
 Attempting to launch a cluster with 100 slaves yields the following:
 {code}
 Spark AMI: ami-35b1885c
 Launching instances...
 Launched 100 slaves in us-east-1c, regid = r-9c408776
 Launched master in us-east-1c, regid = r-92408778
 Waiting for AWS to propagate instance metadata...
 Waiting for cluster to enter 'ssh-ready' state.ERROR:boto:400 Bad Request
 ERROR:boto:?xml version=1.0 encoding=UTF-8?
 ResponseErrorsErrorCodeInvalidRequest/CodeMessage101 exceeds the 
 maximum number of instance IDs that can be specificied (100). Please specify 
 fewer than 100 instance 
 IDs./Message/Error/ErrorsRequestID217fd6ff-9afa-4e91-86bc-ab16fcc442d8/RequestID/Response
 Traceback (most recent call last):
   File ./ec2/spark_ec2.py, line 1338, in module
 main()
   File ./ec2/spark_ec2.py, line 1330, in main
 real_main()
   File ./ec2/spark_ec2.py, line 1170, in real_main
 cluster_state='ssh-ready'
   File ./ec2/spark_ec2.py, line 795, in wait_for_cluster_state
 statuses = conn.get_all_instance_status(instance_ids=[i.id for i in 
 cluster_instances])
   File /path/apache/spark/ec2/lib/boto-2.34.0/boto/ec2/connection.py, line 
 737, in get_all_instance_status
 InstanceStatusSet, verb='POST')
   File /path/apache/spark/ec2/lib/boto-2.34.0/boto/connection.py, line 
 1204, in get_object
 raise self.ResponseError(response.status, response.reason, body)
 boto.exception.EC2ResponseError: EC2ResponseError: 400 Bad Request
 ?xml version=1.0 encoding=UTF-8?
 ResponseErrorsErrorCodeInvalidRequest/CodeMessage101 exceeds the 
 maximum number of instance IDs that can be specificied (100). Please specify 
 fewer than 100 instance 
 IDs./Message/Error/ErrorsRequestID217fd6ff-9afa-4e91-86bc-ab16fcc442d8/RequestID/Response
 {code}
 This problem seems to be with {{get_all_instance_status()}}, though I am not 
 sure if other methods are affected too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6246) spark-ec2 can't handle clusters with 100 nodes

2015-03-10 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355004#comment-14355004
 ] 

Sean Owen commented on SPARK-6246:
--

The funny thing is, the typo in that error message (specificied) makes it 
easy to find some corroboration:

https://github.com/skavanagh/EC2Box/issues/8
https://github.com/worksap-ate/aws-sdk/issues/139

Looks like an AWS SDK limit? 

 spark-ec2 can't handle clusters with  100 nodes
 

 Key: SPARK-6246
 URL: https://issues.apache.org/jira/browse/SPARK-6246
 Project: Spark
  Issue Type: Bug
  Components: EC2
Affects Versions: 1.3.0
Reporter: Nicholas Chammas
Priority: Minor

 This appears to be a new restriction, perhaps resulting from our upgrade of 
 boto. Maybe it's a new restriction from EC2. Not sure yet.
 We didn't have this issue around the Spark 1.1.0 time frame from what I can 
 remember. I'll track down where the issue is and when it started.
 Attempting to launch a cluster with 100 slaves yields the following:
 {code}
 Spark AMI: ami-35b1885c
 Launching instances...
 Launched 100 slaves in us-east-1c, regid = r-9c408776
 Launched master in us-east-1c, regid = r-92408778
 Waiting for AWS to propagate instance metadata...
 Waiting for cluster to enter 'ssh-ready' state.ERROR:boto:400 Bad Request
 ERROR:boto:?xml version=1.0 encoding=UTF-8?
 ResponseErrorsErrorCodeInvalidRequest/CodeMessage101 exceeds the 
 maximum number of instance IDs that can be specificied (100). Please specify 
 fewer than 100 instance 
 IDs./Message/Error/ErrorsRequestID217fd6ff-9afa-4e91-86bc-ab16fcc442d8/RequestID/Response
 Traceback (most recent call last):
   File ./ec2/spark_ec2.py, line 1338, in module
 main()
   File ./ec2/spark_ec2.py, line 1330, in main
 real_main()
   File ./ec2/spark_ec2.py, line 1170, in real_main
 cluster_state='ssh-ready'
   File ./ec2/spark_ec2.py, line 795, in wait_for_cluster_state
 statuses = conn.get_all_instance_status(instance_ids=[i.id for i in 
 cluster_instances])
   File /path/apache/spark/ec2/lib/boto-2.34.0/boto/ec2/connection.py, line 
 737, in get_all_instance_status
 InstanceStatusSet, verb='POST')
   File /path/apache/spark/ec2/lib/boto-2.34.0/boto/connection.py, line 
 1204, in get_object
 raise self.ResponseError(response.status, response.reason, body)
 boto.exception.EC2ResponseError: EC2ResponseError: 400 Bad Request
 ?xml version=1.0 encoding=UTF-8?
 ResponseErrorsErrorCodeInvalidRequest/CodeMessage101 exceeds the 
 maximum number of instance IDs that can be specificied (100). Please specify 
 fewer than 100 instance 
 IDs./Message/Error/ErrorsRequestID217fd6ff-9afa-4e91-86bc-ab16fcc442d8/RequestID/Response
 {code}
 This problem seems to be with {{get_all_instance_status()}}, though I am not 
 sure if other methods are affected too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6246) spark-ec2 can't handle clusters with 100 nodes

2015-03-10 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355642#comment-14355642
 ] 

Nicholas Chammas commented on SPARK-6246:
-

I dunno, I haven't looked into the problem yet (been out all day), but I'm 
surprised that everything else works with  100 nodes: creating nodes, 
destroying them, getting them. It's just the status check call.

If we have to, sure I'll batch the calls. But I suspect there's a better way to 
do things. I'm surprised boto doesn't just abstract this problem away.

Anyway, I'll look into it and report back.

 spark-ec2 can't handle clusters with  100 nodes
 

 Key: SPARK-6246
 URL: https://issues.apache.org/jira/browse/SPARK-6246
 Project: Spark
  Issue Type: Bug
  Components: EC2
Affects Versions: 1.3.0
Reporter: Nicholas Chammas
Priority: Minor

 This appears to be a new restriction, perhaps resulting from our upgrade of 
 boto. Maybe it's a new restriction from EC2. Not sure yet.
 We didn't have this issue around the Spark 1.1.0 time frame from what I can 
 remember. I'll track down where the issue is and when it started.
 Attempting to launch a cluster with 100 slaves yields the following:
 {code}
 Spark AMI: ami-35b1885c
 Launching instances...
 Launched 100 slaves in us-east-1c, regid = r-9c408776
 Launched master in us-east-1c, regid = r-92408778
 Waiting for AWS to propagate instance metadata...
 Waiting for cluster to enter 'ssh-ready' state.ERROR:boto:400 Bad Request
 ERROR:boto:?xml version=1.0 encoding=UTF-8?
 ResponseErrorsErrorCodeInvalidRequest/CodeMessage101 exceeds the 
 maximum number of instance IDs that can be specificied (100). Please specify 
 fewer than 100 instance 
 IDs./Message/Error/ErrorsRequestID217fd6ff-9afa-4e91-86bc-ab16fcc442d8/RequestID/Response
 Traceback (most recent call last):
   File ./ec2/spark_ec2.py, line 1338, in module
 main()
   File ./ec2/spark_ec2.py, line 1330, in main
 real_main()
   File ./ec2/spark_ec2.py, line 1170, in real_main
 cluster_state='ssh-ready'
   File ./ec2/spark_ec2.py, line 795, in wait_for_cluster_state
 statuses = conn.get_all_instance_status(instance_ids=[i.id for i in 
 cluster_instances])
   File /path/apache/spark/ec2/lib/boto-2.34.0/boto/ec2/connection.py, line 
 737, in get_all_instance_status
 InstanceStatusSet, verb='POST')
   File /path/apache/spark/ec2/lib/boto-2.34.0/boto/connection.py, line 
 1204, in get_object
 raise self.ResponseError(response.status, response.reason, body)
 boto.exception.EC2ResponseError: EC2ResponseError: 400 Bad Request
 ?xml version=1.0 encoding=UTF-8?
 ResponseErrorsErrorCodeInvalidRequest/CodeMessage101 exceeds the 
 maximum number of instance IDs that can be specificied (100). Please specify 
 fewer than 100 instance 
 IDs./Message/Error/ErrorsRequestID217fd6ff-9afa-4e91-86bc-ab16fcc442d8/RequestID/Response
 {code}
 This problem seems to be with {{get_all_instance_status()}}, though I am not 
 sure if other methods are affected too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org