Andrei,

I snipped out a portion of the whirr stdout/stderr and just updated
https://issues.apache.org/jira/browse/WHIRR-488, rather than file a jclouds
bug.  Seems more reasonable to sort out potentially pathological whirr
behavior, then file a jclouds defect once that's sorted out.

In any case, here's a good example of what happens:  some nodes are
successfully started, then there are a whole series of jclouds errors
compaining about not being able to resolve a compatible AMI (even though
nodes using the specified AMI have already been created...):

Nodes started: [[id=us-east-1/i-381f245d, providerId=i-381f245d,
> group=pageviews-cluster, name=null, location=[id=us-east-1c, scope=ZONE,
> description=us-east-1c, parent=us-east-1, iso3166Codes=[US-VA],
> metadata={}], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null,
> family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true,
> description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020
> .manifest.xml], state=RUNNING, loginPort=22, hostname=ip-10-40-65-170,
> privateAddresses=[10.40.65.170], publicAddresses=[50.19.189.37],
> hardware=[id=m1.xlarge, providerId=m1.xlarge, name=null, processor
> s=[[cores=4.0, speed=2.0]], ram=15360, volumes=[[id=null, type=LOCAL,
> size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null,
> type=LOCAL, size=420.0, device=/dev/sdb, durable=false, is
> BootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc,
> durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0,
> device=/dev/sdd, durable=false, isBootDevice=false], [id=null,
>  type=LOCAL, size=420.0, device=/dev/sde, durable=false,
> isBootDevice=false]],
> supportsImage=And(ALWAYS_TRUE,Or(isWindows(),requiresVirtualizationType(paravirtual)),ALWAYS_TRUE,is64Bit()),
> tags=[]], log
> inUser=ubuntu, userMetadata={}, tags=[]]]
> Dying because - java.net.SocketTimeoutException: Read timed out
> Dying because - java.net.SocketTimeoutException: Read timed out
> Starting 6 node(s) with roles [hadoop-datanode, hadoop-tasktracker]
> Unexpected error while starting 15 nodes, minimum 12 nodes for
> [hadoop-datanode, hadoop-tasktracker] of cluster pageviews-cluster
> java.util.concurrent.ExecutionException: java.util.NoSuchElementException:
> no image matched predicate:
> And(locationEqualsOrChildOf(us-east-1),And(osFamily(ubuntu),osDescription(ubuntu-images-us/ubuntu-l
>
> ucid-10.04-amd64-server-20101020.manifest.xml),osVersion(10.04),os64Bit(true),osArch(paravirtual)),imageVersion(20101020),imageDescription(ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manif
> est.xml))
>         at
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:111)
>         at
> org.apache.whirr.compute.StartupProcess.waitForOutcomes(StartupProcess.java:129)
>         at
> org.apache.whirr.compute.StartupProcess.call(StartupProcess.java:82)
>         at
> org.apache.whirr.compute.StartupProcess.call(StartupProcess.java:40)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         at java.lang.Thread.run(Thread.java:636)
> Caused by: java.util.NoSuchElementException: no image matched predicate:
> And(locationEqualsOrChildOf(us-east-1),And(osFamily(ubuntu),osDescription(ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-201010
>
> 20.manifest.xml),osVersion(10.04),os64Bit(true),osArch(paravirtual)),imageVersion(20101020),imageDescription(ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml))
>         at
> org.jclouds.compute.domain.internal.TemplateBuilderImpl.throwNoSuchElementExceptionAfterLoggingImageIds(TemplateBuilderImpl.java:620)
>         at
> org.jclouds.compute.domain.internal.TemplateBuilderImpl.build(TemplateBuilderImpl.java:608)
>         at
> org.jclouds.ec2.compute.strategy.EC2CreateNodesInGroupThenAddToSet.execute(EC2CreateNodesInGroupThenAddToSet.java:135)
>         at
> org.jclouds.compute.internal.BaseComputeService.createNodesInGroup(BaseComputeService.java:199)
>         at
> org.jclouds.aws.ec2.compute.AWSEC2ComputeService.createNodesInGroup(AWSEC2ComputeService.java:130)
>         at org.apache.whirr.compute.NodeStarter.call(NodeStarter.java:55)


After a bunch of these, whirr decides there were too many failures, and
tries to destroy any nodes it's created.  Then, it just hangs.

Reply via email to