Hi Steve, Did the hadoop distro in safemode thing ever get worked out? That is what was causing the setfacl commands to fail because the previous mkdir command failed due to the distro being in safemode
***INFO: Setting HDFS ACLs for snapshot scan support mkdir: Cannot create directory /apps/hbase/data/archive. Name node is in safe mode. chown: Unknown command Did you mean -chown? This command begins with a dash. setfacl: `/apps/hbase/data/archive': No such file or directory ***ERROR: (hdfs dfs -setfacl -R -m mask::rwx /apps/hbase/data/archive) command failed ***ERROR: traf_hortonworks_mods98 exited with error. ***ERROR: Please check log files. ***ERROR: Exiting.... The distro being in safemode can also prevent the DTMs from being ready which will cause sqstart to "hang" forever. This also beings up a longtime issue in the design of starting Trafodion, namely that we will wait forever for the DTMs to be ready, we do that because we have no idea how long recovery will take, however, there are many other reasons the DTMs don't become ready when recovery is not even running. Perhaps if we could change the design to distinguish between waiting for DTM's recovery to complete versus the DTM's are not becoming ready for some other condition (and timeout on those). Not that that fixes this particular hang but it would help. --Marvin From: Varnau, Steve (Trafodion) Sent: Monday, March 02, 2015 11:01 AM To: Bouaziz, Khaled; Subbiah, Suresh; [email protected] Cc: Anderson, Marvin Subject: RE: check test failures Oh yes, I think Khaled is right for the error that complains about the setfacl command failing. I believe Marvin has a fix in review right now. I am more worried about the other one (timeout starting trafodion). It was seen couple times Friday and is not obvious why it hangs. -Steve From: Bouaziz, Khaled Sent: Monday, March 02, 2015 06:45 To: Subbiah, Suresh; [email protected]<mailto:[email protected]> Cc: Anderson, Marvin; Varnau, Steve (Trafodion) Subject: RE: check test failures This issue is probably related to the one in the attached email From: Trafodion-firefighters [mailto:trafodion-firefighters-bounces+khaled.bouaziz=hp....@lists.launchpad.net] On Behalf Of Subbiah, Suresh Sent: Monday, March 02, 2015 8:31 AM To: [email protected]<mailto:[email protected]> Subject: [Trafodion-firefighters] FW: check test failures Hi FFs, Any suggestions on how to resolve this problem? Thanks Suresh From: Varnau, Steve (Trafodion) Sent: Saturday, February 28, 2015 11:08 PM To: Subbiah, Suresh; Sheedy, Chris (Trafodion) Cc: Govindarajan, Selvaganes; Cooper, Joanie Subject: RE: check test failures Hi Suresh, Joanie hit this on Friday too. There seems to be an intermittent failure introduced recently, which causes a hang in sqstart. Maybe firefighters need to tackle it. -Steve From: Subbiah, Suresh Sent: Saturday, February 28, 2015 17:53 To: Varnau, Steve (Trafodion); Sheedy, Chris (Trafodion) Cc: Govindarajan, Selvaganes Subject: check test failures Hi Steve, Chris For my checkin https://review.trafodion.org/#/c/1169/ I am getting check tests failures in either the phoenix of the core-seabase suites. As far as I can tell the issue is not with any particular test failing, but with something in the setup stage not working as expected. Is there anything I can do to avoid these errors? Copying Selva since I think his tests may be running into similar pblms. I did not check though. Thanks Suresh Current Run For the current run the first error I see for failing phoenix test is 2015-03-01 01:27:01 ***ERROR: (hdfs dfs -setfacl -R -m mask::rwx /apps/hbase/data/archive) command failed 2015-03-01 01:27:01 ***ERROR: traf_hortonworks_mods98 exited with error. 2015-03-01 01:27:01 ***ERROR: Please check log files. 2015-03-01 01:27:01 ***ERROR: Exiting.... Previous run (core Seabase) ***INFO: End of DCS install. ***INFO: starting Trafodion instance ***INFO: End of DCS install. ***INFO: starting Trafodion instance Build timed out (after 200 minutes). Marking the build as failed. Two runs before (once in phoenix and the in core Seabase) ***ERROR: (hdfs dfs -setfacl -R -m mask::rwx /apps/hbase/data/archive) command failed ***ERROR: traf_hortonworks_mods98 exited with error. ***ERROR: Please check log files. ***ERROR: Exiting....
-- Mailing list: https://launchpad.net/~trafodion-firefighters Post to : [email protected] Unsubscribe : https://launchpad.net/~trafodion-firefighters More help : https://help.launchpad.net/ListHelp

