I created a bug id earlier today and was gathering some info. I'm trying to check if these failures we started noticing after change in the cloudera config on 10th April.
https://bugs.launchpad.net/bugs/1444599 In this failure also noticing the following: http://logs.trafodion.org/daily/jdbc_test-cm5.3/89649db/ >From console output ( Batch ALL test times out - usually completes in 2-3 >mins): 2015-04-15 12:36:17 Batch : total of 6000 rows inserted 2015-04-15 12:36:22 Batch : Passed 2015-04-15 13:17:41 Build timed out (after 90 minutes). Marking the build as failed. 2015-04-15 13:17:41 Build was aborted 2015-04-15 13:17:41 [PostBuildScript] - Execution post build scripts. In dtm logs: 2015-04-15 12:41:12,775 ERROR transactional.TransactionManager: doCommitX, received incorrect result size: 0 2015-04-15 12:41:12,790 ERROR transactional.TransactionManager: doCommitX, received incorrect result size: 0 2015-04-15 12:41:12,791 ERROR transactional.TransactionManager: doCommitX, received incorrect result size: 0 2015-04-15 12:41:12,794 ERROR transactional.TransactionManager: doCommitX, received incorrect result size: 0 2015-04-15 12:41:12,801 ERROR transactional.TransactionManager: doCommitX, received incorrect result size: 0 In master logs: 2015-04-15 12:37:23,454 INFO org.apache.hadoop.hbase.master.RegionStates: Onlined be079188e0d87f47c814f81aa320ed43 on slave-cm53.trafodion.org,60020,1429100734898 2015-04-15 12:41:15,009 INFO org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer ephemeral node deleted, processing expiration [slave-cm53.trafodion.org,60020,1429100734898] 2015-04-15 12:41:15,017 INFO org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: Splitting hbase:meta logs for slave-cm53.trafodion.org,60020,1429100734898 2015-04-15 12:41:15,087 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog workers [slave-cm53.trafodion.org,60020,1429100734898] 2015-04-15 12:41:15,090 INFO org.apache.hadoop.hbase.master.SplitLogManager: started splitting 1 logs in [hdfs://slave-cm53.trafodion.org:8020/hbase/WALs/slave-cm53.trafodion.org,60020,1429100734898-splitting] 2015-04-15 12:41:15,417 INFO org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 unassigned = 1 tasks={/hbase/splitWAL/WALs%2Fslave-cm53.trafodion.org%2C60020%2C1429100734898-splitting%2Fslave-cm53.trafodion.org%252C60020%252C1429100734898.1429100750787.meta=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0} ....... 2015-04-15 13:17:55,035 INFO org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 unassigned = 1 tasks={/hbase/splitWAL/WALs%2Fslave-cm53.trafodion.org%2C60020%2C1429100734898-splitting%2Fslave-cm53.trafodion.org%252C60020%252C1429100734898.1429100750787.meta=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0} 2015-04-15 13:18:00,036 INFO org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 unassigned = 1 tasks={/hbase/splitWAL/WALs%2Fslave-cm53.trafodion.org%2C60020%2C1429100734898-splitting%2Fslave-cm53.trafodion.org%252C60020%252C1429100734898.1429100750787.meta=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0} 2015-04-15 13:18:05,037 INFO org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 unassigned = 1 tasks={/hbase/splitWAL/WALs%2Fslave-cm53.trafodion.org%2C60020%2C1429100734898-splitting%2Fslave-cm53.trafodion.org%252C60020%252C1429100734898.1429100750787.meta=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0} 2015-04-15 13:18:11,037 INFO org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 unassigned = 1 tasks={/hbase/splitWAL/WALs%2Fslave-cm53.trafodion.org%2C60020%2C1429100734898-splitting%2Fslave-cm53.trafodion.org%252C60020%252C1429100734898.1429100750787.meta=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0} Also the following in the regionmaster logs - indicating a restart. 2015-04-15 12:38:35,420 INFO org.apache.hadoop.hbase.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1090ms GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=1276ms 2015-04-15 12:38:40,408 INFO org.apache.hadoop.hbase.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1120ms GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=1216ms 2015-04-15 12:38:42,108 INFO org.apache.hadoop.hbase.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1199ms GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=1230ms 2015-04-15 12:38:45,087 INFO org.apache.hadoop.hbase.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1285ms GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=1306ms ... 2015-04-15 12:40:46,094 INFO org.apache.hadoop.hbase.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1006ms GC pool 'ConcurrentMarkSweep' had collection(s): count=3 time=4681ms 2015-04-15 12:41:03,521 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 27833ms for sessionid 0x14cbd0a059c0000, closing socket connection and attempting reconnect 2015-04-15 12:41:14,898 INFO org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-cdh5.3.1--1, built on 01/28/2015 00:41 GMT From: Trafodion-firefighters [mailto:trafodion-firefighters-bounces+arvind.narain=hp....@lists.launchpad.net] On Behalf Of Chen, Alice (Trafodion) Sent: Wednesday, April 15, 2015 11:26 AM To: Varnau, Steve (Trafodion); Johnson, Stacey; [email protected] Subject: Re: [Trafodion-firefighters] Daily build 2015-04-15 08:30:00 UTC of trafodion/core -- Test Failures The Phoenix part1 and part 2 T4 tests on Cloudera have also been timing more frequently. https://jenkins02.trafodion.org/job/phoenix_part1_T4-cm5.3/buildTimeTrend https://jenkins02.trafodion.org/job/phoenix_part2_T4-cm5.3/buildTimeTrend Cheers, Alice From: Trafodion-firefighters [mailto:trafodion-firefighters-bounces+alice.chen=hp....@lists.launchpad.net] On Behalf Of Varnau, Steve (Trafodion) Sent: Wednesday, April 15, 2015 11:13 AM To: Johnson, Stacey; [email protected]<mailto:[email protected]> Subject: Re: [Trafodion-firefighters] Daily build 2015-04-15 08:30:00 UTC of trafodion/core -- Test Failures Looks like the jdbc_test-cm5.3 job has been timing out more frequently since Monday evening, which is also affecting some check/gate jobs. https://jenkins02.trafodion.org/job/jdbc_test-cm5.3/buildTimeTrend -Steve From: Trafodion-firefighters [mailto:trafodion-firefighters-bounces+steve.varnau=hp....@lists.launchpad.net] On Behalf Of Johnson, Stacey Sent: Wednesday, April 15, 2015 10:59 To: [email protected]<mailto:[email protected]> Subject: [Trafodion-firefighters] Daily build 2015-04-15 08:30:00 UTC of trafodion/core -- Test Failures [cid:[email protected]] Build failed. - traf-pub-release-ahw2.2 http://logs.trafodion.org/daily/traf-pub-release-ahw2.2/bbeb7f2 : SUCCESS in 42m 23s - traf-pub-debug-ahw2.2 http://logs.trafodion.org/daily/traf-pub-debug-ahw2.2/f95089a : SUCCESS in 34m 00s - core-regress-core-cm5.3 http://logs.trafodion.org/daily/core-regress-core-cm5.3/81e8681 : SUCCESS in 2h 43m 40s - core-regress-core-ahw2.2 http://logs.trafodion.org/daily/core-regress-core-ahw2.2/cb82357 : SUCCESS in 2h 13m 51s - core-regress-charsets-cm5.3 http://logs.trafodion.org/daily/core-regress-charsets-cm5.3/99c65ee : SUCCESS in 1h 27m 25s - core-regress-charsets-ahw2.2 http://logs.trafodion.org/daily/core-regress-charsets-ahw2.2/8b75695 : SUCCESS in 1h 43m 07s - core-regress-qat-cm5.3 http://logs.trafodion.org/daily/core-regress-qat-cm5.3/66d1ced : SUCCESS in 1h 21m 22s - core-regress-qat-ahw2.2 http://logs.trafodion.org/daily/core-regress-qat-ahw2.2/3246e89 : SUCCESS in 1h 30m 39s - core-regress-udr-cm5.3 http://logs.trafodion.org/daily/core-regress-udr-cm5.3/a5288d2 : SUCCESS in 1h 14m 09s - core-regress-udr-ahw2.2 http://logs.trafodion.org/daily/core-regress-udr-ahw2.2/cf87e05 : SUCCESS in 1h 26m 56s - core-regress-catman1-cm5.3 http://logs.trafodion.org/daily/core-regress-catman1-cm5.3/088cd53 : SUCCESS in 2h 24m 35s - core-regress-catman1-ahw2.2 http://logs.trafodion.org/daily/core-regress-catman1-ahw2.2/180b5cd : SUCCESS in 2h 35m 40s - core-regress-compGeneral-cm5.3 http://logs.trafodion.org/daily/core-regress-compGeneral-cm5.3/c7926f1 : FAILURE in 2h 28m 38s - core-regress-compGeneral-ahw2.2 http://logs.trafodion.org/daily/core-regress-compGeneral-ahw2.2/d52a532 : FAILURE in 2h 07m 51s - core-regress-executor-cm5.3 http://logs.trafodion.org/daily/core-regress-executor-cm5.3/aa8bc02 : FAILURE in 4h 01m 56s - core-regress-executor-ahw2.2 http://logs.trafodion.org/daily/core-regress-executor-ahw2.2/9e48377 : SUCCESS in 2h 18m 16s - core-regress-fullstack2-cm5.3 http://logs.trafodion.org/daily/core-regress-fullstack2-cm5.3/255525d : SUCCESS in 58m 30s - core-regress-fullstack2-ahw2.2 http://logs.trafodion.org/daily/core-regress-fullstack2-ahw2.2/621e99a : SUCCESS in 1h 06m 04s - core-regress-hive-cm5.3 http://logs.trafodion.org/daily/core-regress-hive-cm5.3/261f8c7 : FAILURE in 1h 46m 16s - core-regress-hive-ahw2.2 http://logs.trafodion.org/daily/core-regress-hive-ahw2.2/69fba37 : FAILURE in 2h 01m 26s - core-regress-seabase-cm5.3 http://logs.trafodion.org/daily/core-regress-seabase-cm5.3/09b2351 : FAILURE in 4h 01m 50s - core-regress-seabase-ahw2.2 http://logs.trafodion.org/daily/core-regress-seabase-ahw2.2/3d8ddd6 : SUCCESS in 2h 08m 39s - phoenix_part1_T4-cm5.3 http://logs.trafodion.org/daily/phoenix_part1_T4-cm5.3/be849f1 : SUCCESS in 2h 15m 05s - phoenix_part2_T4-cm5.3 http://logs.trafodion.org/daily/phoenix_part2_T4-cm5.3/5d58449 : FAILURE in 3h 22m 13s - phoenix_part1_T4-ahw2.2 http://logs.trafodion.org/daily/phoenix_part1_T4-ahw2.2/9a6bbf3 : SUCCESS in 2h 16m 27s - phoenix_part2_T4-ahw2.2 http://logs.trafodion.org/daily/phoenix_part2_T4-ahw2.2/a7011cf : SUCCESS in 2h 18m 47s - phoenix_part1_T2-cm5.3 http://logs.trafodion.org/daily/phoenix_part1_T2-cm5.3/35a7a97 : FAILURE in 49m 00s (non-voting) - phoenix_part2_T2-cm5.3 http://logs.trafodion.org/daily/phoenix_part2_T2-cm5.3/7261156 : FAILURE in 52m 43s (non-voting) - phoenix_part1_T2-ahw2.2 http://logs.trafodion.org/daily/phoenix_part1_T2-ahw2.2/9f9c698 : FAILURE in 57m 33s (non-voting) - phoenix_part2_T2-ahw2.2 http://logs.trafodion.org/daily/phoenix_part2_T2-ahw2.2/2a7da36 : FAILURE in 1h 03m 10s (non-voting) - pyodbc_test-cm5.3 http://logs.trafodion.org/daily/pyodbc_test-cm5.3/3133265 : SUCCESS in 59m 30s - pyodbc_test-ahw2.2 http://logs.trafodion.org/daily/pyodbc_test-ahw2.2/80b0036 : SUCCESS in 1h 06m 46s - jdbc_test-cm5.3 http://logs.trafodion.org/daily/jdbc_test-cm5.3/89649db : FAILURE in 1h 31m 44s - jdbc_test-ahw2.2 http://logs.trafodion.org/daily/jdbc_test-ahw2.2/822b1ef : SUCCESS in 1h 19m 12s
-- Mailing list: https://launchpad.net/~trafodion-firefighters Post to : [email protected] Unsubscribe : https://launchpad.net/~trafodion-firefighters More help : https://help.launchpad.net/ListHelp

