Hi Artem/Adam,
Thanks very much for your input on this issue: we tried a few other things
today.
We were finally able to install Spark through the Ambari UI: it was listed in a
failed state in the UI, so after doing the upgrade to 2.1.2, I just tried the
install again, and this time the install was successful and I was able to start
the service.
Next we tried to install Kafka.
When we install Kafka through Ambari GUI for the first time, we get stuck in
that weird state I mentioned last time, where it won’t proceed beyond
“recommended configurations". Ambari shows it has a Kafka service – the Broker
doesn’t get installed, and there’s no configuration on the file system.
If we delete Kafka through the API and re-install through the API, we could
install Kafka service and component, then install the component to a node and
it installed successfully. There were configuration files created on the file
system in /etc/kafka also, but the the configurations were blank in the Ambari
UI. (I tried restarting ambari-server but there was no change.) Kafka would not
start however, probably because configurations were missing, and Ambari would
not allow us to add or set up the configuration through the UI: it’s just blank.
We can see there’s some duplicate key issue in tables (pasted below) when we
try to perform these INSERT and DELETE operations. We’re tailing the postgres
log.
At this point we’ve deleted the service components from the cluster and we’re
trying to track down the entries in the tables so we can delete entries
associated with the kafka service.
We’ll attempt to re-install if we find records that prove to be in the way.
We noticed in the logs that when we install Kafka the message “kafka-env not
found in dictionary” (below), which seems to show there’s a disconnect between
service configuration templates and actual service configurations. When we had
trouble installing Spark a while ago we saw this same message, except it was
“spark-env” that was not found.
raise Fail("Configuration parameter '" + self.name + "' was not found in
configurations dictionary!")
resource_management.core.exceptions.Fail: Configuration parameter 'kafka-env'
was not found in configurations dictionary!
ERROR: update or delete on table "servicecomponentdesiredstate" violates
foreign key constraint "hstcomponentstatecomponentname" on table
"hostcomponentstate"
DETAIL: Key (component_name,cluster_id,service_name)=(KAFKA_BROKER,2,KAFKA) is
still referenced from table "hostcomponentstate".
STATEMENT: DELETE FROM servicecomponentdesiredstate WHERE (((component_name =
$1) AND (cluster_id = $2)) AND (service_name = $3))
ERROR: current transaction is aborted, commands ignored until end of
transaction block
STATEMENT: SELECT 1
ERROR: duplicate key value violates unique constraint
"servicecomponentdesiredstate_pkey"
STATEMENT: INSERT INTO servicecomponentdesiredstate (component_name,
desired_state, cluster_id, service_name, desired_stack_id) VALUES ($1, $2, $3,
$4, $5)
ERROR: current transaction is aborted, commands ignored until end of
transaction block
STATEMENT: SELECT 1
ERROR: update or delete on table "clusterservices" violates foreign key
constraint "srvccmponentdesiredstatesrvcnm" on table
"servicecomponentdesiredstate"
DETAIL: Key (service_name,cluster_id)=(KAFKA,2) is still referenced from table
"servicecomponentdesiredstate".
STATEMENT: DELETE FROM clusterservices WHERE ((cluster_id = $1) AND
(service_name = $2))
ERROR: current transaction is aborted, commands ignored until end of
transaction block
STATEMENT: SELECT 1
ERROR: duplicate key value violates unique constraint "clusterservices_pkey"
STATEMENT: INSERT INTO clusterservices (service_name, service_enabled,
cluster_id) VALUES ($1, $2, $3)
ERROR: current transaction is aborted, commands ignored until end of
transaction block
STATEMENT: SELECT 1
ERROR: duplicate key value violates unique constraint
"servicecomponentdesiredstate_pkey"
STATEMENT: INSERT INTO servicecomponentdesiredstate (component_name,
desired_state, cluster_id, service_name, desired_stack_id) VALUES ($1, $2, $3,
$4, $5)
ERROR: current transaction is aborted, commands ignored until end of
transaction block
STATEMENT: SELECT 1
LOG: received SIGHUP, reloading configuration files
We’ll let you know if we make more progress.
Cheers
Ken
From: <[email protected]<mailto:[email protected]>> on behalf of Artem Ervits
<[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Date: Friday, October 30, 2015 at 9:32 AM
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: Re: Any way to reset Ambari Install Wizard?
I am guessing his issues are with ambari database, he's concerned to do any
kind of changes in the database directly. I'm trying to nail down the issue to
delete just that bad row. In that sense, upgrading ambari is not a big deal.
Resetting the database and creating a new cluster and import data is a big
deal. What I would do is take account of all the services he has running. Once
he knows what should be in Ambari and what shouldn't, go through every table in
the Ambari database and see if that service or any reference to it exists.
Purge that row and see where that takes you. I personally had issues similar to
that in Ambari as well with earlier releases, 2.1.2 addressed many issues in
the UI and in configuration.
On Fri, Oct 30, 2015 at 11:51 AM, Adam Gover
<[email protected]<mailto:[email protected]>> wrote:
Hi Artem,
Valid Point. I was surprised you suggest he update to 2.1.2 in the midst of
this however. Doesn’t that increase the risk of further problems?
Thanks
Adam
From: <[email protected]<mailto:[email protected]>> on behalf of Artem Ervits
<[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Date: Friday, October 30, 2015 at 11:25 AM
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: Re: Any way to reset Ambari Install Wizard?
note that if a bad config is included in your json which may happen if you
gather the configs, once you reset and reapply, it may come back and all these
steps will be useless. We need to figure out what the issue is. I want him to
avoid going the reset route until we exhaust every other option.
On Fri, Oct 30, 2015 at 10:47 AM, Adam Gover
<[email protected]<mailto:[email protected]>> wrote:
Hi There Ken,
Lets try this again… now actually complete
So I’ve been following along on this thread hoping someone would come back with
a better solution than the one I have. Since I haven’t seen any Ill provide
the details to my solution.
Prereqs/comments:
* tested only on Ambari 2.1.2 – but should work on Ambari 2+ (also will
work with some tweaks on 1.6, but won’t work on 1.7)
* Tested using external postgres database but should also work with mysql
* Test this on your own as it tends to have issues under some circumstances
I can’t provide the code I use to accomplish all this – but ill provide an
outline which should allow you to do the same thing.
General info:
Base path for access to rest api is:
http://<ambari<http://%3Cambari> host>:8080/api/v1
This can be accessed using a standard curl call similar to:
Curl –u admin:admin –H ‘X-Requested-By: ambari’
http://<ambari<http://%3Cambari> host>:8080/api/v1
Ill indicate path to access info will just say “goto rest” and provide
additional path info (any options needed will need to be inserted before the
url). Also note I’m in some cases copying parts of the scripts I’m using so
the values of the variables need to be populated.
1. Backup all external databases (hive/oozie/ambari)
2. Backup the filesystem after forcing a check point
3. Before downing ambari collect a complete set of configs:
* Get list of all configs available
Goto rest: clusters/${cluster_name}/?fields=Clusters/desired_configs
* Using the list retrieve ALL the json config files for the cluster
Goto rest:
http://${ambari_host}:8080/api/v1/clusters/${cluster_name}/configurations?(type=${config_type}&tag=${tag})
So cluster_name=your defined cluster name, config_type=config_filename, tag=the
most recent version of this config file (this is provided by the first rest
call)
Note that the output here is NOT usable directly – you will need to slightly
reformat these files prior to reimporting them
1. Next shutdown ambari
2. On the command line as root execute “ambari-server reset”
3. Setup the base cluster name:
Goto rest: OPTION: -d '{"Clusters":{"version":"HDP-2.2"}}’
/clusters/${cluster_name}
4. For each host on the cluster – add it to the cluster
Goto rest: OPTION –X POST /clusters/${cluster_name}/hosts/${hostname}
5. Push ALL your configs captured in the part 1/3rd step to the cluster via
May want to use this:
/var/lib/ambari-server/resources/scripts/configs.sh
NOTE I do this using perl – its basically a raw read that pushes using (PUT)
into
Goto rest: OPTION –X PUT /clusters/${cluster_name}
6. Next add each service & its associated components
To add service:
Goto rest: OPTION –X POST /clusters/${cluster_name}/services/${service_name}
To add component:
Goto rest: OPTION –X POST
/clusters/${cluster_name}/services/${service_name}/components/${component}
7. Next for each host apply the required components using the follow 2 rest
calls
Goto rest: OPTION –X POST
/clusters/${cluster_name}/hosts/${hostname}/host_components/${component}
Goto rest: OPTION –X PUT OPTION –d '{"HostRoles":{"state":"INSTALLED"}}'
/clusters/${cluster_name}/hosts/${hostname}/host_components/${component}
8. Next set cluster status
Goto rest: OPTION –X POST OPTION –d '{"CLUSTER_CURRENT_STATUS":
"{"clusterState":"CLUSTER_STARTED_5"}"}’ /persist
9. Now set each service to an installed state
Goto rest: OPTION –X PUT OPTION -d'{"ServiceInfo":{"state":"INSTALLED"}}’
/clusters/${cluster_name}/services/${service_name}
10. Finally set the cluster itself to INSTALLED – this (as far as I know) is
best done using SQL - I’m sure there is a rest call but I haven’t found it yet
Update clusters set provisioning_state=‘INSTALLED’,security_type=‘NONE’ where
cluster_name=${cluster_name}
ALL of this can be automated and unless you have a 2 node cluster I would.
NOTES – this process will work with HA & kerberized clusters but will need
additional steps (especially for kerberos)
Anyways I hope this helps – its complicated but doable and will save you
copying/rebuilding which in larger clusters is really not doable.
Cheers
Adam
From: Ken Barclay <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Date: Friday, October 30, 2015 at 2:34 AM
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: Re: Any way to reset Ambari Install Wizard?
Hi Artem,
I upgraded all Ambari components to 2.1.2, restarted everything, and after
logging in, restarted all components where it was indicated.
I tried the Add Service wizard for Kafka, and got to the page that allows me to
assign masters and such, but clicking Next after that takes me to Customize
Services, which gets stuck because the Next button on that page is never
sensitized. It just freezes there, saying it has recommended configurations,
with the update icon spinning in the middle. All I can do is click Back at that
point.
Anything else I can try?
Thanks
Ken
From: Artem Ervits <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Date: Thursday, October 29, 2015 at 1:55 PM
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: Re: Any way to reset Ambari Install Wizard?
Please upgrade to latest 2.1.2 and restart all agents and Ambari server.
Ctrl-shft-r on browser after you navigate to ambari URL. Login and let me know
if it still shows same problem.
On Oct 29, 2015 10:19 AM, "Ken Barclay"
<[email protected]<mailto:[email protected]>> wrote:
Hi Artem,
I started with 2.0.1, and upgraded it to 2.1 back in August.
From: Artem Ervits <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Date: Thursday, October 29, 2015 at 2:09 AM
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: Re: Any way to reset Ambari Install Wizard?
What version of Ambari are you running?
On Oct 27, 2015 6:51 PM, "Ken Barclay"
<[email protected]<mailto:[email protected]>> wrote:
Hello,
I’m returning to an issue we’ve left hanging since July – we have now to fix
Ambari on this cluster or take the whole cluster down and reinstall from
scratch.
Our situation is that although our HDP 2.2 cluster is running well, Ambari
cannot be used to install anything because the wizard is broken.
I did a restart of Ambari server and agents per Artem, but without knowing
exactly what changes to make to the postgres tables I’m reluctant to try that
part. We also tried to add a new component (Spark) using the Ambari API instead
of the wizard, but that also failed, as did trying to remove the Spark (again
via the API) that had failed to install.
We have 1.5T of monitoring data on this 4-node cluster that want to preserve.
The cluster is dedicated to storing metrics in HBase via OpenTSDB and that is
all it is used for.
I just want to confirm with the group that since Ambari can only be used to
manage a cluster that it installed itself, our best option in this scenario
would be to:
Shut down monitoring
Copy all the data to another cluster
Completely remove Ambari and HDP per
https://cwiki.apache.org/confluence/display/AMBARI/Host+Cleanup+for+Ambari+and+Stack
Do a fresh install of HDP 2.2 using the latest Ambari, and
Copy the data back to the new cluster.
Please let us know if this is a valid approach
Thanks
Ken
From: <[email protected]<mailto:[email protected]>> on behalf of Artem Ervits
<[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Date: Tuesday, July 28, 2015 at 12:48 PM
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: Re: Any way to reset Ambari Install Wizard?
try to restart ambari server and agents, then stop and start services,
sometimes services need to announce themselves to Ambari that they're
installed. Always refer to the ambari-server log. Worst case scenario, delete
Ambari_metrics service with API and clean up the postgres DB manually, tables
to concentrate on are hostservicedesiredstate, servicedesiredstate etc. This
should be last resort.
On Tue, Jul 28, 2015 at 3:11 PM, Benoit Perroud
<[email protected]<mailto:[email protected]>> wrote:
Some manual update in DB is most likely needed.
*WARNING* use this at your own risk
The table that needs to be updated is cluster_version.
As far as I tested 2.1, it required less manual intervention than 2.0.1.
Upgrade has a retry button for most of the steps, and this is really cool.
Hope this help.
Benoit
2015-07-28 20:01 GMT+02:00 Ken Barclay
<[email protected]<mailto:[email protected]>>:
Hello,
I upgraded a small test cluster from HDP 2.1 to HDP 2.2 and Ambari 2.0.1. In
following the steps to replace Nagios + Ganglia with the Ambari Metrics System
using the Ambari Wizard, an install failure occurred on one node due to an
outdated glibc library. I updated glibc and verified the Metrics packages could
be installed, but couldn’t go back and finish the installation through the
wizard. The problem is: it flags some of the default settings, saying they need
to be changed, but it skips past the screen very quickly that enables those
settings to be changed, without allowing anything to be entered. So the button
that allows you to proceed with the installation never becomes enabled.
I subsequently manually finished the Metrics installation using the Ambari API
and have it running in Distributed mode. But Ambari’s wizard cannot be used for
anything now: the same problem described above occurs for every service I try
to install.
Can Ambari be reset somehow in this situation, or do I need to reinstall it?
Or do you recommend installing 2.1?
Thanks
Ken