This issue is probably related to the one in the attached email
From: Trafodion-firefighters
[mailto:trafodion-firefighters-bounces+khaled.bouaziz=hp....@lists.launchpad.net]
On Behalf Of Subbiah, Suresh
Sent: Monday, March 02, 2015 8:31 AM
To: [email protected]
Subject: [Trafodion-firefighters] FW: check test failures
Hi FFs,
Any suggestions on how to resolve this problem?
Thanks
Suresh
From: Varnau, Steve (Trafodion)
Sent: Saturday, February 28, 2015 11:08 PM
To: Subbiah, Suresh; Sheedy, Chris (Trafodion)
Cc: Govindarajan, Selvaganes; Cooper, Joanie
Subject: RE: check test failures
Hi Suresh,
Joanie hit this on Friday too. There seems to be an intermittent failure
introduced recently, which causes a hang in sqstart. Maybe firefighters need to
tackle it.
-Steve
From: Subbiah, Suresh
Sent: Saturday, February 28, 2015 17:53
To: Varnau, Steve (Trafodion); Sheedy, Chris (Trafodion)
Cc: Govindarajan, Selvaganes
Subject: check test failures
Hi Steve, Chris
For my checkin https://review.trafodion.org/#/c/1169/ I am getting check tests
failures in either the phoenix of the core-seabase suites.
As far as I can tell the issue is not with any particular test failing, but
with something in the setup stage not working as expected.
Is there anything I can do to avoid these errors? Copying Selva since I think
his tests may be running into similar pblms. I did not check though.
Thanks
Suresh
Current Run
For the current run the first error I see for failing phoenix test is
2015-03-01 01:27:01 ***ERROR: (hdfs dfs -setfacl -R -m mask::rwx
/apps/hbase/data/archive) command failed
2015-03-01 01:27:01 ***ERROR: traf_hortonworks_mods98 exited with error.
2015-03-01 01:27:01 ***ERROR: Please check log files.
2015-03-01 01:27:01 ***ERROR: Exiting....
Previous run (core Seabase)
***INFO: End of DCS install.
***INFO: starting Trafodion instance
***INFO: End of DCS install.
***INFO: starting Trafodion instance
Build timed out (after 200 minutes). Marking the build as failed.
Two runs before (once in phoenix and the in core Seabase)
***ERROR: (hdfs dfs -setfacl -R -m mask::rwx /apps/hbase/data/archive) command
failed
***ERROR: traf_hortonworks_mods98 exited with error.
***ERROR: Please check log files.
***ERROR: Exiting....
--- Begin Message ---
Yeah, hdfs definitely has permission to do anything in the filesystem. The only
question is who will be the owner/group of the archive directory. My script
assumed that hbase user was the owner of /hbase and was the desired owner of
/hbase/archive. So you might want to add a chown command after the mkdir?
-Steve
From: Anderson, Marvin
Sent: Tuesday, February 24, 2015 23:25
To: Varnau, Steve (Trafodion); Bouaziz, Khaled
Cc: Moran, Amanda
Subject: RE: Gate job failing
I get an error using hbase to do the mkdir on our test clusters, it has to be
hdfs userid
[andersma@sea-nodepool installer]$ sudo su hbase --command "hdfs dfs -mkdir -p
/hbase/archive"
mkdir: Permission denied: user=hbase, access=WRITE,
inode="/":hdfs:hdfs:drwxr-xr-x
[andersma@sea-nodepool installer]$ sudo su hdfs --command "hdfs dfs -mkdir -p
/hbase/archive"
So, I guess I'll put it in for the hdfs userid.
--Marvin
From: Varnau, Steve (Trafodion)
Sent: Tuesday, February 24, 2015 4:59 PM
To: Anderson, Marvin; Bouaziz, Khaled
Cc: Moran, Amanda
Subject: RE: Gate job failing
Here is what my script does that runs on non-cluster-mgr Cloudera nodes:
sudo -u hbase hdfs dfs -mkdir -p /hbase/archive
sudo -u hdfs hdfs dfs -setfacl -R -m user:jenkins:rwx /hbase/archive
sudo -u hdfs hdfs dfs -setfacl -R -m default:user:jenkins:rwx /hbase/archive
sudo -u hdfs hdfs dfs -setfacl -R -m mask::rwx /hbase/archive
This is sufficient for the tests. I can't say what other implications there
might be.
-Steve
From: Anderson, Marvin
Sent: Tuesday, February 24, 2015 13:44
To: Varnau, Steve (Trafodion); Bouaziz, Khaled
Cc: Moran, Amanda
Subject: RE: Gate job failing
Does the installer just do a simple HDFS mkdir command like
Hortonworks
hdfs dfs -mkdir -p /apps/hbase/data/archieve
Cloudera
hdfs dfs -mkdir -p /hbase/archive
Does it need to have permissions set any specific way or other ownership or
other settings?
--Marvin
From: Varnau, Steve (Trafodion)
Sent: Tuesday, February 24, 2015 2:58 PM
To: Bouaziz, Khaled; Anderson, Marvin
Cc: Moran, Amanda
Subject: RE: Gate job failing
This may indeed be due to the way we clean up the test nodes between jobs. To
protect against any change or test that corrupts hbase data, we completely
remove hbase data (in HDFS and in Zookeeper) at the beginning of each job and
bring hbase back up to initialize it.
Apparently just bringing up hbase is not creating that directory. So, you could
add installer logic to create it if it does not exist, but if you think it
should always be there on a normal system, then I can script re-create the
archive directory when doing that clean up.
-Steve
From: Bouaziz, Khaled
Sent: Tuesday, February 24, 2015 11:31
To: Anderson, Marvin
Cc: Varnau, Steve (Trafodion); Moran, Amanda
Subject: RE: Gate job failing
Hi Marvin:
I think Steve mentioned something like this before and I think he has a
solution already
thanks
From: Anderson, Marvin
Sent: Tuesday, February 24, 2015 1:12 PM
To: Bouaziz, Khaled
Cc: Varnau, Steve (Trafodion); Moran, Amanda
Subject: Gate job failing
Hi Khaled,
I was checking in the changes for snapshot scan support and those changes run
fine on several of our test clusters but they are failing on the Jenkins build
gate machines for both Cloudera and Hortonworks.
***INFO: Setting HDFS ACLs for snapshot scan support
setfacl: `/hbase/archive': No such file or directory
***ERROR: (hdfs dfs -setfacl -R -m user:trafodion:rwx /hbase/archive) command
failed
***ERROR: traf_cloudera_mods98 exited with error.
***INFO: Setting HDFS ACLs for snapshot scan support
setfacl: `/apps/hbase/data/archive': No such file or directory
***ERROR: (hdfs dfs -setfacl -R -m mask::rwx /apps/hbase/data/archive) command
failed
***ERROR: traf_hortonworks_mods98 exited with error.
Are these missing directories ones we should be creating or should they already
be there. It appears they are not there on the build machines but were there
on all our test machines. So, is this a problem with the build machines hadoop
env or something we should be creating?
--Marvin
--- End Message ---
--
Mailing list: https://launchpad.net/~trafodion-firefighters
Post to : [email protected]
Unsubscribe : https://launchpad.net/~trafodion-firefighters
More help : https://help.launchpad.net/ListHelp