[jira] [Updated] (KUDU-1878) Java openTable() call eagerly connects to all tablets
[ https://issues.apache.org/jira/browse/KUDU-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated KUDU-1878: -- Target Version/s: 1.4.0 (was: 1.3.0) > Java openTable() call eagerly connects to all tablets > - > > Key: KUDU-1878 > URL: https://issues.apache.org/jira/browse/KUDU-1878 > Project: Kudu > Issue Type: Bug > Components: client, impala, java, perf >Affects Versions: 1.2.0 >Reporter: Todd Lipcon > > In a secure cluster, I noticed that every time I run an Impala query, it > issues new service tickets for every tablet server on the coordinator node. > It seems like it is creating a new connection to each tablet server, despite > not actually needing to access any data. This is probably a big part of why > planning is slow for larger clusters. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KUDU-1807) GetTableSchema() is O(n) in the number of tablets
[ https://issues.apache.org/jira/browse/KUDU-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated KUDU-1807: -- Target Version/s: 1.4.0 (was: 1.3.0) Priority: Critical (was: Major) Bumping to 1.4, but also bumping priority since this is important for larger clusters > GetTableSchema() is O(n) in the number of tablets > - > > Key: KUDU-1807 > URL: https://issues.apache.org/jira/browse/KUDU-1807 > Project: Kudu > Issue Type: Bug > Components: master, perf >Affects Versions: 1.2.0 >Reporter: Todd Lipcon >Priority: Critical > > GetTableSchema calls TableInfo::IsCreateTableDone. This method checks each > tablet for whether it is in the correct state, which requires acquiring the > RWC lock for every tablet. This is somewhat slow for large tables with > thousands of tablets, and this is actually a relatively hot path because > every task in an Impala query ends up calling GetTableSchema() when it opens > its scanner. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (KUDU-1015) CM health alert brainstorming
[ https://issues.apache.org/jira/browse/KUDU-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved KUDU-1015. --- Resolution: Invalid Fix Version/s: n/a This doesn't belong on this bug tracker. > CM health alert brainstorming > - > > Key: KUDU-1015 > URL: https://issues.apache.org/jira/browse/KUDU-1015 > Project: Kudu > Issue Type: Improvement > Components: ops-tooling >Affects Versions: Private Beta >Reporter: Todd Lipcon > Fix For: n/a > > > Opening this ticket to start collecting ideas about health checks we should > include in the GA CM integration. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (KUDU-393) Clean up repeated code in ConsensusQueue dump functions
[ https://issues.apache.org/jira/browse/KUDU-393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved KUDU-393. -- Resolution: Won't Fix Fix Version/s: n/a > Clean up repeated code in ConsensusQueue dump functions > --- > > Key: KUDU-393 > URL: https://issues.apache.org/jira/browse/KUDU-393 > Project: Kudu > Issue Type: Sub-task > Components: consensus >Affects Versions: M4.5 >Reporter: Todd Lipcon >Assignee: David Alves >Priority: Trivial > Fix For: n/a > > > David pointed out in > http://gerrit.ent.cloudera.com:8080/?l=340#/c/3442/1/src/consensus/consensus_queue.cc > that DumpToStrings and DumpToHtml share some common code. It's not trivial > to share code, but he thought it might be able to be refactored a little. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (KUDU-1896) enable redaction of data in web UI/tracing
[ https://issues.apache.org/jira/browse/KUDU-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved KUDU-1896. --- Resolution: Fixed Assignee: Todd Lipcon Fix Version/s: 1.3.0 https://gerrit.cloudera.org/#/c/6234/ added the ability to disable the web UI. The above note wasn't quite right - Hao's working on adding test coverage for htpasswd, but the support is already there. So, let's call this one done. > enable redaction of data in web UI/tracing > -- > > Key: KUDU-1896 > URL: https://issues.apache.org/jira/browse/KUDU-1896 > Project: Kudu > Issue Type: Bug > Components: security >Affects Versions: 1.3.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Blocker > Fix For: 1.3.0 > > > Currently even if security is enabled, you can use the web UI > tracing/rpcz/etc to see current RPC traffic, including tokens returned from > ConnectToMaster RPCs, etc. We need to enable redaction for the stringified > PBs in the traces, and probably offer the ability to disable web UI pages > entirely. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KUDU-1906) Java client hangs in scanner after read timeout
[ https://issues.apache.org/jira/browse/KUDU-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated KUDU-1906: -- Status: In Review (was: Open) > Java client hangs in scanner after read timeout > --- > > Key: KUDU-1906 > URL: https://issues.apache.org/jira/browse/KUDU-1906 > Project: Kudu > Issue Type: Bug > Components: client >Affects Versions: 1.3.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Blocker > > Running RowCounter against a reasonably small table, I see map tasks > occasionally hanging waiting for a response from a scanner. It seems to be > correlated with "read timeouts" in the logs. I'm guessing we are not handling > something properly in this case. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KUDU-1906) Java client hangs in scanner after read timeout
Todd Lipcon created KUDU-1906: - Summary: Java client hangs in scanner after read timeout Key: KUDU-1906 URL: https://issues.apache.org/jira/browse/KUDU-1906 Project: Kudu Issue Type: Bug Components: client Affects Versions: 1.3.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Blocker Running RowCounter against a reasonably small table, I see map tasks occasionally hanging waiting for a response from a scanner. It seems to be correlated with "read timeouts" in the logs. I'm guessing we are not handling something properly in this case. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KUDU-1893) Queries incorrectly yielding null results
[ https://issues.apache.org/jira/browse/KUDU-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893046#comment-15893046 ] Andrew Wong commented on KUDU-1893: --- Fixed in 3907340a28bad494cc28a0c93431372a811159d0. > Queries incorrectly yielding null results > - > > Key: KUDU-1893 > URL: https://issues.apache.org/jira/browse/KUDU-1893 > Project: Kudu > Issue Type: Bug > Components: cfile >Affects Versions: 1.2.0 >Reporter: Andrew Wong >Assignee: Andrew Wong >Priority: Blocker > > The queries were on an added, nullable, int32 column with auto encoding and > default compression. Range and equality predicates like `projectId == 32` or > `projectId >= 32` yielded the result for projectId 32, as well as a large > number of rows with null projectId. > Response looks something like: > {code} > STRING projectName=eq179_eq430_1.3_1207, STRING projectDescription=NULL, > STRING comment=NULL, STRING createUserId=6, UNIXTIME_MICROS > createDate=2016-12-07T17:25:18.696000Z, STRING updateUserId=NULL, > UNIXTIME_MICROS updateDate=2016-12-07T17:25:18.696000Z, INT32 projectId=NULL > STRING projectName=santosh_collection_12, STRING projectDescription=NULL, > STRING comment=NULL, STRING createUserId=6, UNIXTIME_MICROS > createDate=2016-12-11T01:52:04.36Z, STRING updateUserId=NULL, > UNIXTIME_MICROS updateDate=2016-12-11T01:52:04.36Z, INT32 projectId=NULL > STRING projectName=To Test Cancel Import, STRING projectDescription=NULL, > STRING comment=NULL, STRING createUserId=5, UNIXTIME_MICROS > createDate=2016-12-13T00:30:26.01Z, STRING updateUserId=NULL, > UNIXTIME_MICROS updateDate=2016-12-13T00:30:26.01Z, INT32 projectId=NULL > STRING projectName=test_hmm_1, STRING projectDescription=NULL, STRING > comment=NULL, STRING createUserId=6, UNIXTIME_MICROS > createDate=2016-12-13T14:03:01.81Z, STRING updateUserId=NULL, > UNIXTIME_MICROS updateDate=2016-12-13T14:03:01.81Z, INT32 projectId=NULL > STRING projectName=Test_M_151216_1, STRING projectDescription=NULL, STRING > comment=NULL, STRING createUserId=6, UNIXTIME_MICROS > createDate=2016-12-15T14:39:26.684000Z, STRING updateUserId=NULL, > UNIXTIME_MICROS updateDate=2016-12-15T14:39:26.684000Z, INT32 projectId=NULL > STRING projectName=test_1_M, STRING projectDescription=NULL, STRING > comment=NULL, STRING createUserId=6, UNIXTIME_MICROS > createDate=2016-12-15T14:59:46.206000Z, STRING updateUserId=NULL, > UNIXTIME_MICROS updateDate=2016-12-15T14:59:46.206000Z, INT32 projectId=NULL > STRING projectName=mm-dev-2, STRING projectDescription=NULL, STRING > comment=NULL, STRING createUserId=6, UNIXTIME_MICROS > createDate=2016-12-15T21:01:26.942000Z, STRING updateUserId=NULL, > UNIXTIME_MICROS updateDate=2016-12-15T21:01:26.942000Z, INT32 projectId=NULL > STRING projectName=abc, STRING projectDescription=NULL, STRING comment=NULL, > STRING createUserId=6, UNIXTIME_MICROS > createDate=2016-12-16T04:23:46.032000Z, STRING updateUserId=NULL, > UNIXTIME_MICROS updateDate=2016-12-16T04:23:46.032000Z, INT32 projectId=NULL > STRING projectName=abcdefg, STRING projectDescription=NULL, STRING > comment=NULL, STRING createUserId=6, UNIXTIME_MICROS > createDate=2016-12-16T04:57:09.212000Z, STRING updateUserId=NULL, > UNIXTIME_MICROS updateDate=2016-12-16T04:57:09.212000Z, INT32 projectId=NULL > STRING projectName=testNew, STRING projectDescription=NULL, STRING > comment=NULL, STRING createUserId=6, UNIXTIME_MICROS > createDate=2016-12-16T10:44:12.13Z, STRING updateUserId=NULL, > UNIXTIME_MICROS updateDate=2016-12-16T10:44:12.13Z, INT32 projectId=NULL > STRING projectName=test_1_M_191216, STRING projectDescription=NULL, STRING > comment=NULL, STRING createUserId=6, UNIXTIME_MICROS > createDate=2016-12-19T16:36:07.774000Z, STRING updateUserId=NULL, > UNIXTIME_MICROS updateDate=2016-12-19T16:36:07.774000Z, INT32 projectId=NULL > STRING projectName=ePF-Test_Dec 22_504140723, STRING projectDescription=NULL, > STRING comment=NULL, STRING createUserId=57, UNIXTIME_MICROS > createDate=2016-12-22T23:15:32.30Z, STRING updateUserId=NULL, > UNIXTIME_MICROS updateDate=2016-12-22T23:15:32.30Z, INT32 projectId=NULL > STRING projectName=mrj_22DecHours_, STRING projectDescription=NULL, STRING > comment=NULL, STRING createUserId=57, UNIXTIME_MICROS > createDate=2016-12-22T23:55:01.963000Z, STRING updateUserId=NULL, > UNIXTIME_MICROS updateDate=2016-12-22T23:55:01.963000Z, INT32 projectId=NULL > STRING projectName=mrj_22Dec_1658Hours, STRING projectDescription=NULL, > STRING comment=NULL, STRING createUserId=57, UNIXTIME_MICROS > createDate=2016-12-22T23:58:48.292000Z, STRING updateUserId=NULL, > UNIXTIME_MICROS updateDate=2016-12-22T23:58:48.292000Z, INT32 projectId=NULL > STRING projectName=mrj_22Dec_1715Hours, STRING
[jira] [Resolved] (KUDU-1893) Queries incorrectly yielding null results
[ https://issues.apache.org/jira/browse/KUDU-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong resolved KUDU-1893. --- Resolution: Fixed Fix Version/s: 1.3.0 > Queries incorrectly yielding null results > - > > Key: KUDU-1893 > URL: https://issues.apache.org/jira/browse/KUDU-1893 > Project: Kudu > Issue Type: Bug > Components: cfile >Affects Versions: 1.2.0 >Reporter: Andrew Wong >Assignee: Andrew Wong >Priority: Blocker > Fix For: 1.3.0 > > > The queries were on an added, nullable, int32 column with auto encoding and > default compression. Range and equality predicates like `projectId == 32` or > `projectId >= 32` yielded the result for projectId 32, as well as a large > number of rows with null projectId. > Response looks something like: > {code} > STRING projectName=eq179_eq430_1.3_1207, STRING projectDescription=NULL, > STRING comment=NULL, STRING createUserId=6, UNIXTIME_MICROS > createDate=2016-12-07T17:25:18.696000Z, STRING updateUserId=NULL, > UNIXTIME_MICROS updateDate=2016-12-07T17:25:18.696000Z, INT32 projectId=NULL > STRING projectName=santosh_collection_12, STRING projectDescription=NULL, > STRING comment=NULL, STRING createUserId=6, UNIXTIME_MICROS > createDate=2016-12-11T01:52:04.36Z, STRING updateUserId=NULL, > UNIXTIME_MICROS updateDate=2016-12-11T01:52:04.36Z, INT32 projectId=NULL > STRING projectName=To Test Cancel Import, STRING projectDescription=NULL, > STRING comment=NULL, STRING createUserId=5, UNIXTIME_MICROS > createDate=2016-12-13T00:30:26.01Z, STRING updateUserId=NULL, > UNIXTIME_MICROS updateDate=2016-12-13T00:30:26.01Z, INT32 projectId=NULL > STRING projectName=test_hmm_1, STRING projectDescription=NULL, STRING > comment=NULL, STRING createUserId=6, UNIXTIME_MICROS > createDate=2016-12-13T14:03:01.81Z, STRING updateUserId=NULL, > UNIXTIME_MICROS updateDate=2016-12-13T14:03:01.81Z, INT32 projectId=NULL > STRING projectName=Test_M_151216_1, STRING projectDescription=NULL, STRING > comment=NULL, STRING createUserId=6, UNIXTIME_MICROS > createDate=2016-12-15T14:39:26.684000Z, STRING updateUserId=NULL, > UNIXTIME_MICROS updateDate=2016-12-15T14:39:26.684000Z, INT32 projectId=NULL > STRING projectName=test_1_M, STRING projectDescription=NULL, STRING > comment=NULL, STRING createUserId=6, UNIXTIME_MICROS > createDate=2016-12-15T14:59:46.206000Z, STRING updateUserId=NULL, > UNIXTIME_MICROS updateDate=2016-12-15T14:59:46.206000Z, INT32 projectId=NULL > STRING projectName=mm-dev-2, STRING projectDescription=NULL, STRING > comment=NULL, STRING createUserId=6, UNIXTIME_MICROS > createDate=2016-12-15T21:01:26.942000Z, STRING updateUserId=NULL, > UNIXTIME_MICROS updateDate=2016-12-15T21:01:26.942000Z, INT32 projectId=NULL > STRING projectName=abc, STRING projectDescription=NULL, STRING comment=NULL, > STRING createUserId=6, UNIXTIME_MICROS > createDate=2016-12-16T04:23:46.032000Z, STRING updateUserId=NULL, > UNIXTIME_MICROS updateDate=2016-12-16T04:23:46.032000Z, INT32 projectId=NULL > STRING projectName=abcdefg, STRING projectDescription=NULL, STRING > comment=NULL, STRING createUserId=6, UNIXTIME_MICROS > createDate=2016-12-16T04:57:09.212000Z, STRING updateUserId=NULL, > UNIXTIME_MICROS updateDate=2016-12-16T04:57:09.212000Z, INT32 projectId=NULL > STRING projectName=testNew, STRING projectDescription=NULL, STRING > comment=NULL, STRING createUserId=6, UNIXTIME_MICROS > createDate=2016-12-16T10:44:12.13Z, STRING updateUserId=NULL, > UNIXTIME_MICROS updateDate=2016-12-16T10:44:12.13Z, INT32 projectId=NULL > STRING projectName=test_1_M_191216, STRING projectDescription=NULL, STRING > comment=NULL, STRING createUserId=6, UNIXTIME_MICROS > createDate=2016-12-19T16:36:07.774000Z, STRING updateUserId=NULL, > UNIXTIME_MICROS updateDate=2016-12-19T16:36:07.774000Z, INT32 projectId=NULL > STRING projectName=ePF-Test_Dec 22_504140723, STRING projectDescription=NULL, > STRING comment=NULL, STRING createUserId=57, UNIXTIME_MICROS > createDate=2016-12-22T23:15:32.30Z, STRING updateUserId=NULL, > UNIXTIME_MICROS updateDate=2016-12-22T23:15:32.30Z, INT32 projectId=NULL > STRING projectName=mrj_22DecHours_, STRING projectDescription=NULL, STRING > comment=NULL, STRING createUserId=57, UNIXTIME_MICROS > createDate=2016-12-22T23:55:01.963000Z, STRING updateUserId=NULL, > UNIXTIME_MICROS updateDate=2016-12-22T23:55:01.963000Z, INT32 projectId=NULL > STRING projectName=mrj_22Dec_1658Hours, STRING projectDescription=NULL, > STRING comment=NULL, STRING createUserId=57, UNIXTIME_MICROS > createDate=2016-12-22T23:58:48.292000Z, STRING updateUserId=NULL, > UNIXTIME_MICROS updateDate=2016-12-22T23:58:48.292000Z, INT32 projectId=NULL > STRING projectName=mrj_22Dec_1715Hours, STRING
[jira] [Commented] (KUDU-1905) TS can not start
[ https://issues.apache.org/jira/browse/KUDU-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892856#comment-15892856 ] David Alves commented on KUDU-1905: --- [~YulongZ] What is the schema for the table? > TS can not start > > > Key: KUDU-1905 > URL: https://issues.apache.org/jira/browse/KUDU-1905 > Project: Kudu > Issue Type: Bug > Components: tserver >Affects Versions: 1.2.0 > Environment: kudu1.2、CDH5.10 >Reporter: YulongZ >Assignee: David Alves >Priority: Blocker > > TabletServer can not start, here is the log: > Mar 1, 8:40:03.872 PM ERROR compaction.cc:966 > Status: Corruption: empty changelist - expected column updates Unable to > decode changelist. > Source Row: RowIdxInBlock: 0; Base: (string party_id=, string > cust_id=, string source_system=); Undo Mutations: > [@6096334378197721088(DELETE)]; Redo Mutations: > [@6096334378560618496(DELETE), @6096334430649839616([invalid: Corruption: > empty changelist - expected column updates])]; > Dest Row: RowIdxInBlock: 0; Base: (string party_id=, string > cust_id=, string source_system=); Undo Mutations: > [@6096334378560618496(DELETE)]; Redo Mutations: > [@6096334378197721088(DELETE)]; > Mar 1, 8:40:03.872 PM FATAL tablet_peer_mm_ops.cc:128 > Check failed: _s.ok() FlushMRS failed on 325bb987ab604d8d9629f8ba4153f7d6: > Corruption: Flush to disk failed: empty changelist - expected column updates -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (KUDU-1905) TS can not start
[ https://issues.apache.org/jira/browse/KUDU-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reassigned KUDU-1905: - Assignee: David Alves David's looking into this one > TS can not start > > > Key: KUDU-1905 > URL: https://issues.apache.org/jira/browse/KUDU-1905 > Project: Kudu > Issue Type: Bug > Components: tserver >Affects Versions: 1.2.0 > Environment: kudu1.2、CDH5.10 >Reporter: YulongZ >Assignee: David Alves >Priority: Blocker > > TabletServer can not start, here is the log: > Mar 1, 8:40:03.872 PM ERROR compaction.cc:966 > Status: Corruption: empty changelist - expected column updates Unable to > decode changelist. > Source Row: RowIdxInBlock: 0; Base: (string party_id=, string > cust_id=, string source_system=); Undo Mutations: > [@6096334378197721088(DELETE)]; Redo Mutations: > [@6096334378560618496(DELETE), @6096334430649839616([invalid: Corruption: > empty changelist - expected column updates])]; > Dest Row: RowIdxInBlock: 0; Base: (string party_id=, string > cust_id=, string source_system=); Undo Mutations: > [@6096334378560618496(DELETE)]; Redo Mutations: > [@6096334378197721088(DELETE)]; > Mar 1, 8:40:03.872 PM FATAL tablet_peer_mm_ops.cc:128 > Check failed: _s.ok() FlushMRS failed on 325bb987ab604d8d9629f8ba4153f7d6: > Corruption: Flush to disk failed: empty changelist - expected column updates -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KUDU-1580) Connection negotiation timeout to tablet server is treated as unretriable error
[ https://issues.apache.org/jira/browse/KUDU-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Serbin updated KUDU-1580: Target Version/s: 1.4.0 (was: 1.3.0) > Connection negotiation timeout to tablet server is treated as unretriable > error > --- > > Key: KUDU-1580 > URL: https://issues.apache.org/jira/browse/KUDU-1580 > Project: Kudu > Issue Type: Bug > Components: client >Affects Versions: 0.10.0 >Reporter: Todd Lipcon >Assignee: Alexey Serbin >Priority: Critical > > In the case that the leader tablet server is up but "frozen", the client will > get a connection negotiation timeout trying to establish an RPC connection. > It appears that this Status::TimeOut() is treated as a non-retriable error by > WriteRpc::AnalyzeResponse, so the client gets a failure even if there has > been a leader re-election within the client-provided deadline. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KUDU-1704) Add a new read mode to perform bounded staleness snapshot reads
[ https://issues.apache.org/jira/browse/KUDU-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Serbin updated KUDU-1704: Target Version/s: (was: 1.3.0) > Add a new read mode to perform bounded staleness snapshot reads > --- > > Key: KUDU-1704 > URL: https://issues.apache.org/jira/browse/KUDU-1704 > Project: Kudu > Issue Type: Sub-task >Affects Versions: 1.1.0 >Reporter: David Alves >Assignee: Alexey Serbin > > It would be useful to be able to perform snapshot reads at a timestamp that > is higher than a client provided timestamp, thus improving recency, but lower > that the server's oldest inflight transaction, thus minimizing the scan's > chance to block. > Such a mode would not guarantee linearizability, but would still allow for > client-local read-your-writes, which seems to be one of the properties users > care about the most. > This should likely be the new default read mode for scanners. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KUDU-1034) Client does not fail over due to timeout
[ https://issues.apache.org/jira/browse/KUDU-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Serbin updated KUDU-1034: Target Version/s: 1.4.0 (was: 1.3.0) > Client does not fail over due to timeout > > > Key: KUDU-1034 > URL: https://issues.apache.org/jira/browse/KUDU-1034 > Project: Kudu > Issue Type: Bug > Components: client >Affects Versions: Feature Complete >Reporter: Mike Percy >Assignee: Alexey Serbin >Priority: Critical > Attachments: client_timeout_fail.patch, > client_timeout_flush_hang.patch > > > The client will not fail over due to a timeout error. Attaching a failing > test case. > I just made the test case part of RaftConsensusITest because it was > convenient, maybe it should go elsewhere. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KUDU-1466) C++ client errors misreported as GetTableLocations timeouts
[ https://issues.apache.org/jira/browse/KUDU-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Serbin updated KUDU-1466: Target Version/s: 1.4.0 (was: 1.3.0) > C++ client errors misreported as GetTableLocations timeouts > --- > > Key: KUDU-1466 > URL: https://issues.apache.org/jira/browse/KUDU-1466 > Project: Kudu > Issue Type: Bug > Components: client >Affects Versions: 0.8.0 >Reporter: Todd Lipcon >Assignee: Alexey Serbin >Priority: Critical > > client-test is currently very flaky due to this issue: > - we are injecting some kind of failure on the tablet server (eg DNS > resolution failure) > - when we fail to connect to the TS, we correctly re-trigger a lookup against > the master > - depending how the backoffs and retries line up, we sometimes end up > triggering the lookup retry when the remaining operation budget is very short > (eg <10ms) > -- this GetTabletLocations RPC times out since the master is unable to > respond within the ridiculously short timeout > During the course of retrying some operation, we should probably not replace > the 'last_error' with a master error, so long as we have had at least one > successful master lookup (thus indicating that the master is not the problem) -- This message was sent by Atlassian JIRA (v6.3.15#6346)