Hi Everyone,
Turns out that it was a DNS server issue exactly. Had to get this
confirmed by the Data Centre, though.
Thanks!
On Fri, Nov 13, 2015 at 12:25 PM, Josef Roehrl - PHEMI
<[email protected] <mailto:[email protected]>> wrote:
Hi All,
3 times in the past few weeks (twice on 1 system, once on another),
the master gets UnknownHostException (s), one by one, for each of
the tablet servers. Then, it wants to stop them. Eventually, all
the tablet servers quit.
It goes like this for all the tablet servers:
12 08:14:01,0498 tserver:6 20
ERROR
error sending update to tserver3:9997:
org.apache.thrift.transport.TTransportException:
java.net.UnknownHostException
12 09:01:53,0352 master:1 2
ERROR
org.apache.thrift.transport.TTransportException:
java.net.UnknownHostException
12 16:35:50,0672 master:1 10
ERROR
unable to get tablet server status tserver3:9997[250e6cd2c500012]
org.apache.thrift.transport.TTransportException:
java.net.UnknownHostException
I've redacted the real host names, of course.
This could be a DNS problem, though the system was running fine for
days before this happened (same scenario on the 2 systems with
really quite different DNS servers).
If any one has a hint or seen something like this, I would
appreciate any pointers.
I have looked at the JIRA issues regarding DNS outages, but nothing
seems to fit this pattern.
Thanks
--
Josef Roehrl
Senior Software Developer
*PHEMI Systems*
180-887 Great Northern Way
Vancouver, BC V5T 4T5
604-336-1119
Website <http://www.phemi.com/> Twitter
<https://twitter.com/PHEMISystems> Linkedin
<http://www.linkedin.com/company/3561810?trk=tyah&trkInfo=tarId%3A1403279580554%2Ctas%3Aphemi%20hea%2Cidx%3A1-1-1>
--
Josef Roehrl
Senior Software Developer
*PHEMI Systems*
180-887 Great Northern Way
Vancouver, BC V5T 4T5
604-336-1119
Website <http://www.phemi.com/> Twitter
<https://twitter.com/PHEMISystems> Linkedin
<http://www.linkedin.com/company/3561810?trk=tyah&trkInfo=tarId%3A1403279580554%2Ctas%3Aphemi%20hea%2Cidx%3A1-1-1>