Hi Steve,
Did the hadoop distro in safemode thing ever get worked out?  That is what was 
causing the setfacl commands to fail because the previous mkdir command failed 
due to the distro being in safemode

***INFO: Setting HDFS ACLs for snapshot scan support
mkdir: Cannot create directory /apps/hbase/data/archive. Name node is in safe 
mode.
chown: Unknown command
Did you mean -chown?  This command begins with a dash.
setfacl: `/apps/hbase/data/archive': No such file or directory
***ERROR: (hdfs dfs -setfacl -R -m mask::rwx /apps/hbase/data/archive) command 
failed
***ERROR: traf_hortonworks_mods98 exited with error.
***ERROR: Please check log files.
***ERROR: Exiting....

The distro being in safemode can also prevent the DTMs from being ready which 
will cause sqstart to "hang" forever.

This also beings up a longtime issue in the design of starting Trafodion, 
namely that we will wait forever for the DTMs to be ready, we do that because 
we have no idea how long recovery will take, however, there are many other 
reasons the DTMs don't become ready when recovery is not even running.  Perhaps 
if we could change the design to distinguish between waiting for DTM's recovery 
to complete versus the DTM's are not becoming ready for some other condition 
(and timeout on those).  Not that that fixes this particular hang but it would 
help.

--Marvin

From: Varnau, Steve (Trafodion)
Sent: Monday, March 02, 2015 11:01 AM
To: Bouaziz, Khaled; Subbiah, Suresh; [email protected]
Cc: Anderson, Marvin
Subject: RE: check test failures

Oh yes, I think Khaled is right for the error that complains about the setfacl 
command failing.  I believe Marvin has a fix in review right now.

I am more worried about the other one (timeout starting trafodion). It was seen 
couple times Friday and is not obvious why it hangs.

-Steve

From: Bouaziz, Khaled
Sent: Monday, March 02, 2015 06:45
To: Subbiah, Suresh; 
[email protected]<mailto:[email protected]>
Cc: Anderson, Marvin; Varnau, Steve (Trafodion)
Subject: RE: check test failures

This issue is probably related to the one in the attached email


From: Trafodion-firefighters 
[mailto:trafodion-firefighters-bounces+khaled.bouaziz=hp....@lists.launchpad.net]
 On Behalf Of Subbiah, Suresh
Sent: Monday, March 02, 2015 8:31 AM
To: 
[email protected]<mailto:[email protected]>
Subject: [Trafodion-firefighters] FW: check test failures

Hi FFs,

Any suggestions on how to resolve this problem?

Thanks
Suresh

From: Varnau, Steve (Trafodion)
Sent: Saturday, February 28, 2015 11:08 PM
To: Subbiah, Suresh; Sheedy, Chris (Trafodion)
Cc: Govindarajan, Selvaganes; Cooper, Joanie
Subject: RE: check test failures

Hi Suresh,

Joanie hit this on Friday too. There seems to be an intermittent failure 
introduced recently, which causes a hang in sqstart. Maybe firefighters need to 
tackle it.

-Steve

From: Subbiah, Suresh
Sent: Saturday, February 28, 2015 17:53
To: Varnau, Steve (Trafodion); Sheedy, Chris (Trafodion)
Cc: Govindarajan, Selvaganes
Subject: check test failures

Hi Steve, Chris

For my checkin https://review.trafodion.org/#/c/1169/ I am getting check tests 
failures in either the phoenix  of the core-seabase suites.
As far as I can tell the issue is not with any particular test failing, but 
with something in the setup stage not working as expected.
Is there anything I can do to avoid these errors? Copying Selva since I think 
his tests may be running into similar pblms. I did not check though.

Thanks
Suresh


Current Run
For the current run the first error I see for failing phoenix test is
2015-03-01 01:27:01 ***ERROR: (hdfs dfs -setfacl -R -m mask::rwx 
/apps/hbase/data/archive) command failed
2015-03-01 01:27:01 ***ERROR: traf_hortonworks_mods98 exited with error.
2015-03-01 01:27:01 ***ERROR: Please check log files.
2015-03-01 01:27:01 ***ERROR: Exiting....

Previous run (core Seabase)
***INFO: End of DCS install.
***INFO: starting Trafodion instance
***INFO: End of DCS install.
***INFO: starting Trafodion instance
Build timed out (after 200 minutes). Marking the build as failed.

Two runs before (once in phoenix and the in core Seabase)
***ERROR: (hdfs dfs -setfacl -R -m mask::rwx /apps/hbase/data/archive) command 
failed
***ERROR: traf_hortonworks_mods98 exited with error.
***ERROR: Please check log files.
***ERROR: Exiting....
-- 
Mailing list: https://launchpad.net/~trafodion-firefighters
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~trafodion-firefighters
More help   : https://help.launchpad.net/ListHelp

Reply via email to