[
https://issues.apache.org/jira/browse/WHIRR-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982259#action_12982259
]
Tibor Kiss edited comment on WHIRR-167 at 1/16/11 4:07 AM:
-----------------------------------------------------------
The OptionParser is already initialized the OptionSpecs in the constructor of
AbstractClusterSpecCommand while the ConfigSpec is instantiated only after
that, based on the collected OptionSpecs. As a result,
ClusterSpec#getInstanceTemplates() cannot be used for generating the options.
jOptSimple has to know exactly all the options we can pass through argument
list.
Something like the following can be constructed with jOptSimple api.
{code}
whirr.instance-templates=1 jt+nn,4 dn+tt
whirr.instance-templates-max-percent-failures=100 jt+nn,60 dn+tt
whirr.instance-templates-minimum-number-of-instances=1 jt+nn,3 dn+tt
{code}
or just enumerating only the roles which are not strictly limited to 100%
successfull nodes.
{code}
whirr.instance-templates=1 jt+nn,4 dn+tt
whirr.instance-templates-max-percent-failures=60 dn+tt
whirr.instance-templates-minimum-number-of-instances=3 dn+tt
{code}
The Commons Configuration supports hierarchical configuration, the jOptSimple
does not. Exactly the same constraint applies like in the case of
whirr.instance-templates, in the sense that hierarchy can be expressed only in
the value side.
I was looking to the history of introducing these contructs. Commons
Configuration it was introduced in WHIRR-75 then later the jOptSimple in
WHIRR-102.
was (Author: tibor.kiss):
The OptionParser is already initialized the OptionSpecs in the constructor
of AbstractClusterSpecCommand while the ConfigSpec is instantiated only after
that, based on the collected OptionSpecs. As a result,
ClusterSpec#getInstanceTemplates() cannot be used for generating the options.
jOptSimple has to know exactly all the options we can pass through argument
list.
Something like the following can be constructed with jOptSimple api.
{code}
whirr.instance-templates=1 jt+nn,4 dn+tt
whirr.instance-templates.max-percent-failures=100 jt+nn,60 dn+tt
whirr.instance-templates.minimum-number-of-instances=1 jt+nn,3 dn+tt
{code}
or just enumerating only the roles which are not strictly limited to 100%
successfull nodes.
{code}
whirr.instance-templates=1 jt+nn,4 dn+tt
whirr.instance-templates.max-percent-failures=60 dn+tt
whirr.instance-templates.minimum-number-of-instances=3 dn+tt
{code}
The Commons Configuration supports hierarchical configuration, the jOptSimple
does not. Exactly the same constraint applies like in the case of
whirr.instance-templates, in the sense that hierarchy can be expressed only in
the value side.
I was looking to the history of introducing these contructs. Commons
Configuration it was introduced in WHIRR-75 then later the jOptSimple in
WHIRR-102.
> Improve bootstrapping and configuration to be able to isolate and repair or
> evict failing nodes on EC2
> ------------------------------------------------------------------------------------------------------
>
> Key: WHIRR-167
> URL: https://issues.apache.org/jira/browse/WHIRR-167
> Project: Whirr
> Issue Type: Improvement
> Environment: Amazon EC2
> Reporter: Tibor Kiss
> Assignee: Tibor Kiss
> Attachments: whirr-167-1.patch, whirr.log
>
>
> Actually it is very unstable the cluster startup process on Amazon EC2
> instances. How the number of nodes to be started up is increasing the startup
> process it fails more often. But sometimes even 2-3 nodes startup process
> fails. We don't know how many number of instance startup is going on at the
> same time at Amazon side when it fails or when it successfully starting up.
> The only think I see is that when I am starting around 10 nodes, the
> statistics of failing nodes are higher then with smaller number of nodes and
> is not direct proportional with the number of nodes, looks like it is
> exponentialy higher probability to fail some nodes.
> Lookint into BootstrapCluterAction.java, there is a note "// TODO: Check for
> RunNodesException and don't bail out if only a few " which indicated the
> current unreliable startup process. So we should improve it.
> We could add a "max percent failure" property (per instance template), so
> that if the number failures exceeded this value the whole cluster fails to
> launch and is shutdown. For the master node the value would be 100%, but for
> datanodes it would be more like 75%. (Tom White also mentioned in an email).
> Let's discuss if there are any other requirements to this improvement.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.