[jira] [Updated] (KUDU-1466) C++ client errors misreported as GetTableLocations timeouts
[ https://issues.apache.org/jira/browse/KUDU-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Serbin updated KUDU-1466: Target Version/s: 1.8.0 (was: 1.7.0) > C++ client errors misreported as GetTableLocations timeouts > --- > > Key: KUDU-1466 > URL: https://issues.apache.org/jira/browse/KUDU-1466 > Project: Kudu > Issue Type: Bug > Components: client >Affects Versions: 0.8.0 >Reporter: Todd Lipcon >Assignee: Alexey Serbin >Priority: Critical > > client-test is currently very flaky due to this issue: > - we are injecting some kind of failure on the tablet server (eg DNS > resolution failure) > - when we fail to connect to the TS, we correctly re-trigger a lookup against > the master > - depending how the backoffs and retries line up, we sometimes end up > triggering the lookup retry when the remaining operation budget is very short > (eg <10ms) > -- this GetTabletLocations RPC times out since the master is unable to > respond within the ridiculously short timeout > During the course of retrying some operation, we should probably not replace > the 'last_error' with a master error, so long as we have had at least one > successful master lookup (thus indicating that the master is not the problem) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-1466) C++ client errors misreported as GetTableLocations timeouts
[ https://issues.apache.org/jira/browse/KUDU-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-1466: -- Target Version/s: 1.7.0 (was: 1.6.0) > C++ client errors misreported as GetTableLocations timeouts > --- > > Key: KUDU-1466 > URL: https://issues.apache.org/jira/browse/KUDU-1466 > Project: Kudu > Issue Type: Bug > Components: client >Affects Versions: 0.8.0 >Reporter: Todd Lipcon >Assignee: Alexey Serbin >Priority: Critical > > client-test is currently very flaky due to this issue: > - we are injecting some kind of failure on the tablet server (eg DNS > resolution failure) > - when we fail to connect to the TS, we correctly re-trigger a lookup against > the master > - depending how the backoffs and retries line up, we sometimes end up > triggering the lookup retry when the remaining operation budget is very short > (eg <10ms) > -- this GetTabletLocations RPC times out since the master is unable to > respond within the ridiculously short timeout > During the course of retrying some operation, we should probably not replace > the 'last_error' with a master error, so long as we have had at least one > successful master lookup (thus indicating that the master is not the problem) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-1466) C++ client errors misreported as GetTableLocations timeouts
[ https://issues.apache.org/jira/browse/KUDU-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans updated KUDU-1466: - Target Version/s: 1.6.0 (was: 1.5.0) > C++ client errors misreported as GetTableLocations timeouts > --- > > Key: KUDU-1466 > URL: https://issues.apache.org/jira/browse/KUDU-1466 > Project: Kudu > Issue Type: Bug > Components: client >Affects Versions: 0.8.0 >Reporter: Todd Lipcon >Assignee: Alexey Serbin >Priority: Critical > > client-test is currently very flaky due to this issue: > - we are injecting some kind of failure on the tablet server (eg DNS > resolution failure) > - when we fail to connect to the TS, we correctly re-trigger a lookup against > the master > - depending how the backoffs and retries line up, we sometimes end up > triggering the lookup retry when the remaining operation budget is very short > (eg <10ms) > -- this GetTabletLocations RPC times out since the master is unable to > respond within the ridiculously short timeout > During the course of retrying some operation, we should probably not replace > the 'last_error' with a master error, so long as we have had at least one > successful master lookup (thus indicating that the master is not the problem) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KUDU-1466) C++ client errors misreported as GetTableLocations timeouts
[ https://issues.apache.org/jira/browse/KUDU-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Serbin updated KUDU-1466: Target Version/s: 1.4.0 (was: 1.3.0) > C++ client errors misreported as GetTableLocations timeouts > --- > > Key: KUDU-1466 > URL: https://issues.apache.org/jira/browse/KUDU-1466 > Project: Kudu > Issue Type: Bug > Components: client >Affects Versions: 0.8.0 >Reporter: Todd Lipcon >Assignee: Alexey Serbin >Priority: Critical > > client-test is currently very flaky due to this issue: > - we are injecting some kind of failure on the tablet server (eg DNS > resolution failure) > - when we fail to connect to the TS, we correctly re-trigger a lookup against > the master > - depending how the backoffs and retries line up, we sometimes end up > triggering the lookup retry when the remaining operation budget is very short > (eg <10ms) > -- this GetTabletLocations RPC times out since the master is unable to > respond within the ridiculously short timeout > During the course of retrying some operation, we should probably not replace > the 'last_error' with a master error, so long as we have had at least one > successful master lookup (thus indicating that the master is not the problem) -- This message was sent by Atlassian JIRA (v6.3.15#6346)