This issue is probably related to the one in the attached email

From: Trafodion-firefighters 
[mailto:trafodion-firefighters-bounces+khaled.bouaziz=hp....@lists.launchpad.net]
 On Behalf Of Subbiah, Suresh
Sent: Monday, March 02, 2015 8:31 AM
To: [email protected]
Subject: [Trafodion-firefighters] FW: check test failures

Hi FFs,

Any suggestions on how to resolve this problem?

Thanks
Suresh

From: Varnau, Steve (Trafodion)
Sent: Saturday, February 28, 2015 11:08 PM
To: Subbiah, Suresh; Sheedy, Chris (Trafodion)
Cc: Govindarajan, Selvaganes; Cooper, Joanie
Subject: RE: check test failures

Hi Suresh,

Joanie hit this on Friday too. There seems to be an intermittent failure 
introduced recently, which causes a hang in sqstart. Maybe firefighters need to 
tackle it.

-Steve

From: Subbiah, Suresh
Sent: Saturday, February 28, 2015 17:53
To: Varnau, Steve (Trafodion); Sheedy, Chris (Trafodion)
Cc: Govindarajan, Selvaganes
Subject: check test failures

Hi Steve, Chris

For my checkin https://review.trafodion.org/#/c/1169/ I am getting check tests 
failures in either the phoenix  of the core-seabase suites.
As far as I can tell the issue is not with any particular test failing, but 
with something in the setup stage not working as expected.
Is there anything I can do to avoid these errors? Copying Selva since I think 
his tests may be running into similar pblms. I did not check though.

Thanks
Suresh


Current Run
For the current run the first error I see for failing phoenix test is
2015-03-01 01:27:01 ***ERROR: (hdfs dfs -setfacl -R -m mask::rwx 
/apps/hbase/data/archive) command failed
2015-03-01 01:27:01 ***ERROR: traf_hortonworks_mods98 exited with error.
2015-03-01 01:27:01 ***ERROR: Please check log files.
2015-03-01 01:27:01 ***ERROR: Exiting....

Previous run (core Seabase)
***INFO: End of DCS install.
***INFO: starting Trafodion instance
***INFO: End of DCS install.
***INFO: starting Trafodion instance
Build timed out (after 200 minutes). Marking the build as failed.

Two runs before (once in phoenix and the in core Seabase)
***ERROR: (hdfs dfs -setfacl -R -m mask::rwx /apps/hbase/data/archive) command 
failed
***ERROR: traf_hortonworks_mods98 exited with error.
***ERROR: Please check log files.
***ERROR: Exiting....
--- Begin Message ---
Yeah, hdfs definitely has permission to do anything in the filesystem. The only 
question is who will be the owner/group of the archive directory. My script 
assumed that hbase user was the owner of /hbase and was the desired owner of 
/hbase/archive.  So you might want to add a chown command after the mkdir?



-Steve



From: Anderson, Marvin
Sent: Tuesday, February 24, 2015 23:25
To: Varnau, Steve (Trafodion); Bouaziz, Khaled
Cc: Moran, Amanda
Subject: RE: Gate job failing



I get an error using hbase  to do the mkdir on our test clusters, it has to be 
hdfs userid



[andersma@sea-nodepool installer]$ sudo su hbase --command "hdfs dfs -mkdir -p 
/hbase/archive"

mkdir: Permission denied: user=hbase, access=WRITE, 
inode="/":hdfs:hdfs:drwxr-xr-x



[andersma@sea-nodepool installer]$ sudo su hdfs --command "hdfs dfs -mkdir -p 
/hbase/archive"



So, I guess I'll put it in for the hdfs userid.



--Marvin



From: Varnau, Steve (Trafodion)
Sent: Tuesday, February 24, 2015 4:59 PM
To: Anderson, Marvin; Bouaziz, Khaled
Cc: Moran, Amanda
Subject: RE: Gate job failing



Here is what my script does that runs on non-cluster-mgr Cloudera nodes:



  sudo -u hbase hdfs dfs -mkdir -p /hbase/archive

  sudo -u hdfs hdfs dfs -setfacl -R -m user:jenkins:rwx /hbase/archive

  sudo -u hdfs hdfs dfs -setfacl -R -m default:user:jenkins:rwx /hbase/archive

  sudo -u hdfs hdfs dfs -setfacl -R -m mask::rwx /hbase/archive



This is sufficient for the tests. I can't say what other implications there 
might be.



-Steve



From: Anderson, Marvin
Sent: Tuesday, February 24, 2015 13:44
To: Varnau, Steve (Trafodion); Bouaziz, Khaled
Cc: Moran, Amanda
Subject: RE: Gate job failing



Does the installer just do a simple HDFS mkdir command like



Hortonworks

hdfs dfs -mkdir -p /apps/hbase/data/archieve



Cloudera

hdfs dfs -mkdir -p /hbase/archive



Does it need to have permissions set any specific way or other ownership or 
other settings?



--Marvin



From: Varnau, Steve (Trafodion)
Sent: Tuesday, February 24, 2015 2:58 PM
To: Bouaziz, Khaled; Anderson, Marvin
Cc: Moran, Amanda
Subject: RE: Gate job failing



This may indeed be due to the way we clean up the test nodes between jobs. To 
protect against any change or test that corrupts hbase data, we completely 
remove hbase data (in HDFS and in Zookeeper) at the beginning of each job and 
bring hbase back up to initialize it.



Apparently just bringing up hbase is not creating that directory. So, you could 
add installer logic to create it if it does not exist, but if you think it 
should always be there on a normal system, then I can script re-create the 
archive directory when doing that clean up.



-Steve



From: Bouaziz, Khaled
Sent: Tuesday, February 24, 2015 11:31
To: Anderson, Marvin
Cc: Varnau, Steve (Trafodion); Moran, Amanda
Subject: RE: Gate job failing



Hi Marvin:



I think Steve mentioned something like this before  and I think he has a 
solution already



thanks



From: Anderson, Marvin
Sent: Tuesday, February 24, 2015 1:12 PM
To: Bouaziz, Khaled
Cc: Varnau, Steve (Trafodion); Moran, Amanda
Subject: Gate job failing



Hi Khaled,

I was checking in the changes for snapshot scan support and those changes run 
fine on several of our test clusters but they are failing on the Jenkins build  
gate machines for both Cloudera and Hortonworks.



***INFO: Setting HDFS ACLs for snapshot scan support

setfacl: `/hbase/archive': No such file or directory

***ERROR: (hdfs dfs -setfacl -R -m user:trafodion:rwx /hbase/archive) command 
failed

***ERROR: traf_cloudera_mods98 exited with error.





***INFO: Setting HDFS ACLs for snapshot scan support
setfacl: `/apps/hbase/data/archive': No such file or directory
***ERROR: (hdfs dfs -setfacl -R -m mask::rwx /apps/hbase/data/archive) command 
failed
***ERROR: traf_hortonworks_mods98 exited with error.



Are these missing directories ones we should be creating or should they already 
be there.  It appears they are not there on the build machines but were there 
on all our test machines.  So, is this a problem with the build machines hadoop 
env or something we should be creating?



--Marvin








--- End Message ---
-- 
Mailing list: https://launchpad.net/~trafodion-firefighters
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~trafodion-firefighters
More help   : https://help.launchpad.net/ListHelp

Reply via email to