> POST HTTP/1.1 failed with code 503 code='RequestLimitExceeded', message='Request limit exceeded.',

AWS throttles the EC2 API on a per-account basis if requests come too fast (or some other unknown measure). I think it must be an exponential delay function since I had the experience in March 2011 of getting locked out of both the AWS Console and command line for hours at a time. Good thing the HTTP ret code actually tells you what is happening in the above case. The natural response to seeming unreachability is to try more often with various tools, eventually leading to many hours of lockout and the impression that their services have big problems. (AWS Support was nice enough to apologize to me for setting this trap that resulted in a missed deadline by a day, but I think they should warn developers in their documentation. There is no reason to keep it secret.)

The RequestLimitExceeded problem needs to be reduced by carefully eliminating unnecessary AWS API requests by using caching. The end result will make Whirr faster and usable for larger clusters.


Paul

On 20111007 7:57 , Paolo Castagna wrote:
On 7 October 2011 15:50, Andrei Savu<[email protected]>  wrote:
Paolo,

I think we've got few users that use Whirr to deploy clusters with more than
10 nodes.
I confirm that Apache Whirr goes "up to eleven" nodes. :-)
But, I have problems with 20 which isn't much more than 11.

  [1] http://en.wikipedia.org/wiki/Up_to_eleven

My suggestion is to take a look at the configuration page because there are
some settings you can tweak so that Whirr can start larger clusters.
I will.

Tibor any feedback on this? How are you handling similar issues?
Any help, suggestion and/or hadoop-ec2.properties examples are more
than welcome.

Thanks,
Paolo

On Oct 7, 2011 5:07 PM, "Paolo Castagna"<[email protected]>
wrote:
Hi,
I am using Apache Whirr 0.6.0-incubating.

When I start an Hadoop cluster on EC2 using 11 datanodes/tasktrackers:
whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,11
hadoop-datanode+hadoop-tasktracker
everything seems to go fine. I sometimes see one or two instances not
able to start correctly,
but Whirr seems to terminate those and restart new ones.

If I try to run an Hadoop cluster using 20 or more
datanodes/tasktrackers the amount of errors increases.

I see a lot of errors like this:

2011-10-07 07:54:50,058 ERROR [jclouds.compute] (user thread 13)<<
problem applying options to node(eu-west-1/i-eec231a7):
org.jclouds.aws.AWSResponseException: request POST
https://ec2.eu-west-1.amazonaws.com/ HTTP/1.1 failed with code 503,
error: AWSError{requestId='af239496-844a-49c3-99d0-fdf0d01b7f45',
requestToken='null', code='RequestLimitExceeded', message='Request
limit exceeded.', context='{Response=, Errors=}'}
        at
org.jclouds.aws.handlers.ParseAWSErrorFromXmlContent.handleError(ParseAWSErrorFromXmlContent.java:74)
        at
org.jclouds.http.handlers.DelegatingErrorHandler.handleError(DelegatingErrorHandler.java:71)
        at
org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.shouldContinue(BaseHttpCommandExecutorService.java:200)
        at
org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.call(BaseHttpCommandExecutorService.java:165)
        at
org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.call(BaseHttpCommandExecutorService.java:134)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

After a while Whirr gives up and fail to start the cluster.

Any idea on why this happens?

Thanks,
Paolo

Reply via email to