Yep, default is 60 on HW as well, and we don't seem to have same trouble there.

-Steve

From: Trafodion-firefighters 
[mailto:trafodion-firefighters-bounces+steve.varnau=hp....@lists.launchpad.net] 
On Behalf Of Varnau, Steve (Trafodion)
Sent: Friday, April 17, 2015 15:33
To: Narain, Arvind; Johnson, Stacey
Cc: [email protected]
Subject: Re: [Trafodion-firefighters] Daily build 2015-04-17 08:30:00 UTC of 
trafodion/core -- Test Failures

Arvind,

I checked zookeeper logs on a couple other machines that ran that test job in 
last week or so and it looks like you are right. At the time when the tests 
either hung or started getting errors. The Too many connections errors started 
showing up in the zookeeper logs.

So, is 60 too low of a number, or are we causing too many connections to 
zookeeper?

On initial look, I could not find equivalent value on the HW distro.

-Steve

From: Narain, Arvind
Sent: Friday, April 17, 2015 14:49
To: Varnau, Steve (Trafodion); Johnson, Stacey
Cc: 
[email protected]<mailto:[email protected]>
Subject: RE: Daily build 2015-04-17 08:30:00 UTC of trafodion/core -- Test 
Failures

I meant my knowledge on this parameter and the error seen.

From: Narain, Arvind
Sent: Friday, April 17, 2015 2:31 PM
To: Varnau, Steve (Trafodion); Johnson, Stacey
Cc: 
[email protected]<mailto:[email protected]>
Subject: RE: Daily build 2015-04-17 08:30:00 UTC of trafodion/core -- Test 
Failures

Thanks Steve. This surely helps.

We do have lots of the following messages indicating either we are leaking 
connections or making more connections - Maybe for authorization we need more?

We could increase the concurrent connections via maxClientCnxns. Did this 
change recently or was this being set with earlier distribution ? Do check with 
someone - my knowledge on this is limited.

2015-04-17 12:38:00,632 WARN org.apache.zookeeper.server.NIOServerCnxnFactory: 
Too many connections from /172.16.0.76 - max is 60
2015-04-17 12:38:02,727 WARN org.apache.zookeeper.server.NIOServerCnxnFactory: 
Too many connections from /172.16.0.76 - max is 60
2015-04-17 12:38:04,033 WARN org.apache.zookeeper.server.NIOServerCnxnFactory: 
Too many connections from /172.16.0.76 - max is 60
2015-04-17 12:38:05,662 WARN org.apache.zookeeper.server.NIOServerCnxnFactory: 
Too many connections from /172.16.0.76 - max is 60

Now that we do have a zoo log would be interested to see if what other jobs 
failed on this system in the past few days and if we could get more info from 
this log.

From: Varnau, Steve (Trafodion)
Sent: Friday, April 17, 2015 1:30 PM
To: Narain, Arvind; Johnson, Stacey
Cc: 
[email protected]<mailto:[email protected]>
Subject: RE: Daily build 2015-04-17 08:30:00 UTC of trafodion/core -- Test 
Failures

Unfortunately, we are not archiving zookeeper logs for each job.  But I've gone 
to the machine that ran that particular job and uploaded the (rather large) 
zookeeper log to 
http://logs.trafodion.org/daily/phoenix_part2_T4-cm5.3/898ae16/ for you.  
You'll have to sift thru to find the right times.

-Steve

From: Narain, Arvind
Sent: Friday, April 17, 2015 12:31
To: Johnson, Stacey; Varnau, Steve (Trafodion)
Cc: 
[email protected]<mailto:[email protected]>
Subject: RE: Daily build 2015-04-17 08:30:00 UTC of trafodion/core -- Test 
Failures

Regarding:


- phoenix_part2_T4-cm5.3 
http://logs.trafodion.org/daily/phoenix_part2_T4-cm5.3/898ae16 : FAILURE in 3h 
22m 30s


Are there any zookeeper logs that will help in identifying this issue? Getting 
the following errors accessing hbase [12:40  thru 14:21]

org.trafodion.jdbc.t4.HPT4Exception: *** ERROR[1398] Error 0 occured while 
accessing the hbase subsystem. Fix that error and make sure hbase is up and 
running. Error Details: . [2015-04-17 12:40:06]

http://logs.trafodion.org/daily/phoenix_part2_T4-cm5.3/898ae16/traf_run/logs/trafodion.dtm.log

2015-04-17 12:40:40,199 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists 
failed after 4 attempts
2015-04-17 12:40:40,236 ERROR zookeeper.ZooKeeperWatcher: 
hconnection-0x7db75f15, quorum=slave-cm53.trafodion.org:2181, baseZNode=/hbase 
Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /hbase/hbaseid

==

http://logs.trafodion.org/daily/phoenix_part2_T4-cm5.3/898ae16/traf_run/logs/trafodion.hdfs.log

2015-04-17 12:38:19,321 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists 
failed after 4 attempts
2015-04-17 12:38:19,325 ERROR zookeeper.ZooKeeperWatcher: 
catalogtracker-on-hconnection-0x28ab34f2, quorum=slave-cm53.trafodion.org:2181, 
baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /hbase/meta-region-server




From: Trafodion-firefighters 
[mailto:trafodion-firefighters-bounces+arvind.narain=hp....@lists.launchpad.net]
 On Behalf Of Johnson, Stacey
Sent: Friday, April 17, 2015 9:31 AM
To: 
[email protected]<mailto:[email protected]>
Subject: [Trafodion-firefighters] Daily build 2015-04-17 08:30:00 UTC of 
trafodion/core -- Test Failures




[cid:[email protected]]



Build failed.



- traf-pub-release-ahw2.2 
http://logs.trafodion.org/daily/traf-pub-release-ahw2.2/39af503 : SUCCESS in 
43m 05s

- traf-pub-debug-ahw2.2 
http://logs.trafodion.org/daily/traf-pub-debug-ahw2.2/248d564 : SUCCESS in 34m 
38s

- build-product-release 
http://logs.trafodion.org/daily/build-product-release/c040e47 : SUCCESS in 19m 
16s

- build-product-debug 
http://logs.trafodion.org/daily/build-product-debug/54d57bd : SUCCESS in 17m 45s

- core-regress-core-cm5.3 
http://logs.trafodion.org/daily/core-regress-core-cm5.3/73d3fde : SUCCESS in 2h 
47m 15s

- core-regress-core-ahw2.2 
http://logs.trafodion.org/daily/core-regress-core-ahw2.2/02746ac : SUCCESS in 
2h 15m 53s

- core-regress-charsets-cm5.3 
http://logs.trafodion.org/daily/core-regress-charsets-cm5.3/3560c5b : SUCCESS 
in 1h 28m 46s

- core-regress-charsets-ahw2.2 
http://logs.trafodion.org/daily/core-regress-charsets-ahw2.2/278d998 : SUCCESS 
in 1h 40m 33s

- core-regress-qat-cm5.3 
http://logs.trafodion.org/daily/core-regress-qat-cm5.3/c8953c9 : SUCCESS in 1h 
24m 23s

- core-regress-qat-ahw2.2 
http://logs.trafodion.org/daily/core-regress-qat-ahw2.2/87209ea : SUCCESS in 1h 
33m 37s

- core-regress-udr-cm5.3 
http://logs.trafodion.org/daily/core-regress-udr-cm5.3/9bbaa0f : SUCCESS in 1h 
13m 12s

- core-regress-udr-ahw2.2 
http://logs.trafodion.org/daily/core-regress-udr-ahw2.2/7ace4ab : SUCCESS in 1h 
26m 53s

- core-regress-catman1-cm5.3 
http://logs.trafodion.org/daily/core-regress-catman1-cm5.3/cec89c0 : SUCCESS in 
2h 22m 36s

- core-regress-catman1-ahw2.2 
http://logs.trafodion.org/daily/core-regress-catman1-ahw2.2/cd118c3 : SUCCESS 
in 2h 41m 54s

- core-regress-compGeneral-cm5.3 
http://logs.trafodion.org/daily/core-regress-compGeneral-cm5.3/0870564 : 
SUCCESS in 2h 47m 59s

- core-regress-compGeneral-ahw2.2 
http://logs.trafodion.org/daily/core-regress-compGeneral-ahw2.2/24671af : 
FAILURE in 31m 51s

- core-regress-executor-cm5.3 
http://logs.trafodion.org/daily/core-regress-executor-cm5.3/9ae374f : FAILURE 
in 4h 01m 46s

- core-regress-executor-ahw2.2 
http://logs.trafodion.org/daily/core-regress-executor-ahw2.2/df5362a : SUCCESS 
in 2h 18m 01s

- core-regress-fullstack2-cm5.3 
http://logs.trafodion.org/daily/core-regress-fullstack2-cm5.3/c0c71d5 : FAILURE 
in 4h 00m 22s

- core-regress-fullstack2-ahw2.2 
http://logs.trafodion.org/daily/core-regress-fullstack2-ahw2.2/e689523 : 
SUCCESS in 1h 06m 40s

- core-regress-hive-cm5.3 
http://logs.trafodion.org/daily/core-regress-hive-cm5.3/316bb4c : FAILURE in 1h 
49m 07s

- core-regress-hive-ahw2.2 
http://logs.trafodion.org/daily/core-regress-hive-ahw2.2/d3d9a3f : FAILURE in 
2h 00m 04s

- core-regress-seabase-cm5.3 
http://logs.trafodion.org/daily/core-regress-seabase-cm5.3/7cb286d : FAILURE in 
4h 01m 31s

- core-regress-seabase-ahw2.2 
http://logs.trafodion.org/daily/core-regress-seabase-ahw2.2/60e71d0 : SUCCESS 
in 2h 10m 15s

- phoenix_part1_T4-cm5.3 
http://logs.trafodion.org/daily/phoenix_part1_T4-cm5.3/76e5e7d : SUCCESS in 2h 
21m 42s

- phoenix_part2_T4-cm5.3 
http://logs.trafodion.org/daily/phoenix_part2_T4-cm5.3/898ae16 : FAILURE in 3h 
22m 30s

- phoenix_part1_T4-ahw2.2 
http://logs.trafodion.org/daily/phoenix_part1_T4-ahw2.2/594c172 : SUCCESS in 2h 
11m 42s

- phoenix_part2_T4-ahw2.2 
http://logs.trafodion.org/daily/phoenix_part2_T4-ahw2.2/4a65e23 : SUCCESS in 2h 
18m 44s

- phoenix_part1_T2-cm5.3 
http://logs.trafodion.org/daily/phoenix_part1_T2-cm5.3/a303691 : FAILURE in 1h 
04m 35s (non-voting)

- phoenix_part2_T2-cm5.3 
http://logs.trafodion.org/daily/phoenix_part2_T2-cm5.3/2aa431e : FAILURE in 51m 
10s (non-voting)

- phoenix_part1_T2-ahw2.2 
http://logs.trafodion.org/daily/phoenix_part1_T2-ahw2.2/c2c0bd2 : FAILURE in 1h 
03m 42s (non-voting)

- phoenix_part2_T2-ahw2.2 
http://logs.trafodion.org/daily/phoenix_part2_T2-ahw2.2/d655e16 : FAILURE in 
59m 35s (non-voting)

- pyodbc_test-cm5.3 http://logs.trafodion.org/daily/pyodbc_test-cm5.3/01d6eeb : 
SUCCESS in 1h 13m 26s

- pyodbc_test-ahw2.2 http://logs.trafodion.org/daily/pyodbc_test-ahw2.2/b1a66ec 
: SUCCESS in 1h 13m 36s

- jdbc_test-cm5.3 http://logs.trafodion.org/daily/jdbc_test-cm5.3/fef27f8 : 
FAILURE in 1h 31m 41s

- jdbc_test-ahw2.2 http://logs.trafodion.org/daily/jdbc_test-ahw2.2/56f24a3 : 
SUCCESS in 1h 16m 37s
-- 
Mailing list: https://launchpad.net/~trafodion-firefighters
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~trafodion-firefighters
More help   : https://help.launchpad.net/ListHelp

Reply via email to