[ https://issues.apache.org/jira/browse/KUDU-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570017#comment-15570017 ]
Todd Lipcon commented on KUDU-1418: ----------------------------------- [~jdcryans] did we get to the root of this? there have been various fixes on the Java client since this was reported but I dont know if they match up > [java client] Master lookups can vanish under certain conditions > ---------------------------------------------------------------- > > Key: KUDU-1418 > URL: https://issues.apache.org/jira/browse/KUDU-1418 > Project: Kudu > Issue Type: Bug > Components: client > Reporter: Jean-Daniel Cryans > Assignee: Jean-Daniel Cryans > Priority: Critical > > While testing Kudu with our internal QA tools(1), we found that both DNS > failure injection and elastic partitioning between clients and the master > trigger a bug where master lookups just... vanish. Here's an example: > {noformat} > 2016-04-13 22:18:55,506 WARN [New I/O boss #9] > org.kududb.client.GetMasterRegistrationReceived: Error receiving a response > from: francesco-ec2-kudu-centos66-11-1.vpc.cloudera.com:7051 > org.kududb.client.ConnectionResetException: [Peer Kudu Master - > francesco-ec2-kudu-centos66-11-1.vpc.cloudera.com:7051] Connection reset on > [id: 0x9bd8ed44] > at org.kududb.client.TabletClient.cleanup(TabletClient.java:630) > (stack trace) > 2016-04-13 22:18:55,507 WARN [New I/O boss #9] > org.kududb.client.GetMasterRegistrationReceived: Unable to find the leader > master (francesco-ec2-kudu-centos66-11-1.vpc.cloudera.com:7051), will retry > 2016-04-13 22:18:55,507 DEBUG [New I/O boss #9] > org.kududb.client.AsyncKuduClient: Going to sleep for 1017 at retry 2 > 2016-04-13 22:18:55,507 DEBUG [New I/O worker #7] > org.kududb.client.TabletClient: [Peer Kudu Master - > francesco-ec2-kudu-centos66-11-1.vpc.cloudera.com:7051] [id: 0x9bd8ed44] > CLOSED > (unrelated debug logs) > 2016-04-13 22:28:44,951 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : java.io.IOException: Couldn't flush the head row, > KuduRpc(method=Write, tablet=null, attempt=1, DeadlineTracker(timeout=0, > elapsed=600001), null) row_key=(int64 key1=-721818921243156941, int64 > key2=5432210168070573172) > at > org.kududb.mapreduce.tools.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:516) > {noformat} > The client tries to reach the master, fails, says it's gonna retry in a > second... then nothing until ITBLL times out 10 minutes later. > 1. > https://blog.cloudera.com/blog/2016/04/quality-assurance-at-cloudera-fault-injection-and-elastic-partitioning/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)