Github user GenTang commented on the pull request:
https://github.com/apache/spark/pull/3986#issuecomment-69473316
However, I met a really strange error a moment ago.
I launched a cluster containing 1 master and 1 slave with the script.
Add_tag to master succussed after two tries and add_tag to slave succussed
without throwing out the error. However, EC2 threw out
`InvalidInstanceID.NotFound` error for slave node at :
```
for i in cluster_instances:
i.update()
```
in wait_for_cluster_state function. It seems that the information of
instance has not been propagated for the update action. Meantime, information
of instance has reached to certain point that add_tag action can be succussed.
I tried several time, it happened only once. I am not very clear why it
happened. As wait_for_cluster_state is used for `launch`, `start`(these two
need more than 1 minute to reach `ssh-ready` state), `destroy`(it needs about 1
second to reach `terminated` state) action, maybe the workaround this to add
some more waiting time to launch update action by making following change:
```
while True:
time.sleep(5 * num_attempts + 1)
```
at line 724
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]