> On July 20, 2017, 10:14 a.m., Attila Doroszlai wrote: > > Normally the property is added during Ambari upgrade: initially with > > default value of "1024", then updated to "1024m" by `UpgradeCatalog222`. > > (Try upgrading from Apache Ambari 2.2.1 to 2.5.2.) > > > > The root cause of the problem is that `zk_server_heapsize` is referenced in > > `zookeeper-env` (the `content`) in BigInsights 4.2, but the property itself > > is missing. It is then added during stack upgrade with its raw default > > value. > > > > I think the proper fix is to add the missing property in the BI 4.2 stack > > definition. The current patch would be a nice workaround if there already > > were clusters with the broken value. > > Jonathan Hurley wrote: > I think that there are clusters with the broken value today.
Ah, I see what you're saying. So, if we added it to the BI stack, then it would get taken care of up Ambari Server upgrade automatically. We should do that. - Jonathan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/60986/#review181042 ----------------------------------------------------------- On July 19, 2017, 8:13 p.m., Alejandro Fernandez wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/60986/ > ----------------------------------------------------------- > > (Updated July 19, 2017, 8:13 p.m.) > > > Review request for Ambari, Di Li, Jonathan Hurley, Sumit Mohanty, Sid Wagle, > and Tim Thorpe. > > > Bugs: AMBARI-21528 > https://issues.apache.org/jira/browse/AMBARI-21528 > > > Repository: ambari > > > Description > ------- > > Repro Steps: > > * Installed BI 4.2.0 cluster on IBM Ambari 2.2.2 with Zookeeper > * Upgraded Ambari to 2.5.2.0-146 > * Registered HDP 2.6.2.0 repo, installed packages > * Ran service checks > * Started Express Upgrade > > Result: _Service Check ZooKeeper_ step failed with {{KeeperErrorCode = > ConnectionLoss for /zk_smoketest}} > > This was caused by Zookeeper dying immediately during restart: > ``` > Error occurred during initialization of VM > Too small initial heap > ``` > > Before EU > ``` > export JAVA_HOME=/usr/jdk64/java-1.8.0-openjdk-1.8.0.77-0.b03.el7_2.x86_64 > export ZOOKEEPER_HOME=/usr/iop/current/zookeeper-server > export ZOO_LOG_DIR=/var/log/zookeeper > export ZOOPIDFILE=/var/run/zookeeper/zookeeper_server.pid > export SERVER_JVMFLAGS=-Xmx1024m > export JAVA=$JAVA_HOME/bin/java > export CLASSPATH=$CLASSPATH:/usr/share/zookeeper/* > ``` > > After EU > ``` > export JAVA_HOME=/usr/jdk64/java-1.8.0-openjdk-1.8.0.77-0.b03.el7_2.x86_64 > export ZOOKEEPER_HOME=/usr/hdp/current/zookeeper-client > export ZOO_LOG_DIR=/var/log/zookeeper > export ZOOPIDFILE=/var/run/zookeeper/zookeeper_server.pid > export SERVER_JVMFLAGS=-Xmx1024 > export JAVA=$JAVA_HOME/bin/java > ``` > > Note missing "m" in memory setting. > > zookeeper-env template contains, > ``` > export SERVER_JVMFLAGS={{zk_server_heapsize}} > ``` > > In this cluster, zookeeper-env contains, > zk_server_heapsize: "1024" > > While the params_linux.py file has some inconsistencies with appending the > letter "m". > ``` > zk_server_heapsize_value = > str(default('configurations/zookeeper-env/zk_server_heapsize', "1024m")) > zk_server_heapsize = format("-Xmx{zk_server_heapsize_value}") > ``` > > Instead, it should be, > ``` > zk_server_heapsize_value = > str(default('configurations/zookeeper-env/zk_server_heapsize', "1024")) > zk_server_heapsize_value = zk_server_heapsize_value.strip() > if len(zk_server_heapsize_value) > 0 and not > zk_server_heapsize_value[-1].isdigit(): > zk_server_heapsize_value = zk_server_heapsize_value + "m" > zk_server_heapsize = format("-Xmx{zk_server_heapsize_value}") > ``` > > > Diffs > ----- > > > ambari-server/src/main/resources/common-services/ZOOKEEPER/3.4.5/package/scripts/params_linux.py > 0780d2e > > > Diff: https://reviews.apache.org/r/60986/diff/2/ > > > Testing > ------- > > Python unit tests passed, > > ---------------------------------------------------------------------- > Total run:1161 > Total errors:0 > Total failures:0 > OK > > > Thanks, > > Alejandro Fernandez > >