[jira] [Created] (KUDU-2208) Subprocess::Wait should handle EINTR
Todd Lipcon created KUDU-2208: - Summary: Subprocess::Wait should handle EINTR Key: KUDU-2208 URL: https://issues.apache.org/jira/browse/KUDU-2208 Project: Kudu Issue Type: Bug Components: test, util Affects Versions: 1.6.0 Reporter: Todd Lipcon I saw this test failure in a TSAN build: Bad status: Runtime error: Unable to wait() for /tmp/run_tha_testJyhcOT/build/tsan/bin/kudu: Unable to wait on child: Interrupted system call (error 4) It seems like we need to add a RETRY_ON_EINTR loop to Subprocess::Wait -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KUDU-1809) ScanToken API does not respect batch size configuration
[ https://issues.apache.org/jira/browse/KUDU-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hao Hao updated KUDU-1809: -- Code Review: https://gerrit.cloudera.org/#/c/8435/ > ScanToken API does not respect batch size configuration > --- > > Key: KUDU-1809 > URL: https://issues.apache.org/jira/browse/KUDU-1809 > Project: Kudu > Issue Type: Bug > Components: client >Reporter: Dan Burkert >Assignee: Hao Hao >Priority: Major > Labels: newbie > > Both implementations of the ScanToken API ignore the scan batch size > configuration option. The field is not even included in the ScanToken > protobuf message. As a result, setting the option takes no effect on > deserialized scanners. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KUDU-2209) HybridClock doesn't handle changes STA_NANO status flag
Todd Lipcon created KUDU-2209: - Summary: HybridClock doesn't handle changes STA_NANO status flag Key: KUDU-2209 URL: https://issues.apache.org/jira/browse/KUDU-2209 Project: Kudu Issue Type: Bug Components: server Affects Versions: 1.6.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Critical Users have occasionally reported spurious crashes due to Kudu thinking that another node has a time stamp from the future. After some debugging I realized that the issue is that we currently capture the flag 'STA_NANO' from the kernel only at startup. This flag indicates whether the kernel's sub-second timestamp is in nanoseconds or microseconds. We initially assumed this was a static property of the kernel. However it turns out that this flag can get toggled at runtime by ntp in certain circumstances. Given this, it was possible for us to interpret a number of nanoseconds as if it were microseconds, resulting in a timestamp up to 1000 seconds in the future. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KUDU-1578) kudu-tserver should refuse service or "freeze" instead of crash when NTP loses sync
[ https://issues.apache.org/jira/browse/KUDU-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reassigned KUDU-1578: - Assignee: Todd Lipcon > kudu-tserver should refuse service or "freeze" instead of crash when NTP > loses sync > --- > > Key: KUDU-1578 > URL: https://issues.apache.org/jira/browse/KUDU-1578 > Project: Kudu > Issue Type: Bug > Components: tserver >Reporter: zhangsong >Assignee: Todd Lipcon >Priority: Major > > Currently, kudu-tserver will crash when ntp is unsynchronized. > However this behavior maybe not the right in large cluster ,when crash can > lead to replicate which can be useless or harm to cluster availability. > Instead, kudu-tserver should suspend it self like refusing to serve write , > let the administrator decide what to do. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KUDU-1578) kudu-tserver should refuse service or "freeze" instead of crash when NTP loses sync
[ https://issues.apache.org/jira/browse/KUDU-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16235089#comment-16235089 ] Todd Lipcon commented on KUDU-1578: --- I put up a patch at http://gerrit.cloudera.org:8080/8451 which partially addresses this. In particular I didn't go through the complexity of trying to be "partially up" while NTP is down. Rather, I changed the clock to ride over brief periods of NTP synchronization loss, which logging errors to the log. Assuming typical configurations this should allow Kudu to stay up even if NTP goes out for tens of minutes. > kudu-tserver should refuse service or "freeze" instead of crash when NTP > loses sync > --- > > Key: KUDU-1578 > URL: https://issues.apache.org/jira/browse/KUDU-1578 > Project: Kudu > Issue Type: Bug > Components: tserver >Reporter: zhangsong >Assignee: Todd Lipcon >Priority: Major > > Currently, kudu-tserver will crash when ntp is unsynchronized. > However this behavior maybe not the right in large cluster ,when crash can > lead to replicate which can be useless or harm to cluster availability. > Instead, kudu-tserver should suspend it self like refusing to serve write , > let the administrator decide what to do. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (KUDU-2200) Sanity-check that users specify the right number of masters when connecting
[ https://issues.apache.org/jira/browse/KUDU-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved KUDU-2200. --- Resolution: Fixed Fix Version/s: 1.6.0 > Sanity-check that users specify the right number of masters when connecting > --- > > Key: KUDU-2200 > URL: https://issues.apache.org/jira/browse/KUDU-2200 > Project: Kudu > Issue Type: Improvement > Components: client, master, supportability >Affects Versions: 1.6.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Major > Fix For: 1.6.0 > > > A common issue I've seen is that users set up an HA master setup (3 masters) > but then in various cases only specify one of the masters when they try to > connect using the client. This currently will work if it happens that they > picked the leader master, and otherwise will return a s omewhat confusing "no > leader" error message. > We should improve usability here by having the master send back a list of the > master addresses in the case that it isn't the leader, and the client can > use this to provide a more actionable error message like "Client connection > specified only a subset of the cluster's masters" or somesuch. > I wouldn't want to automatically reconfigure the client and reconnect > because this puts the client in a configuration state that will fail once the > one master they specified goes down. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KUDU-1411) Implement HT timestamp propagation in the Java client
[ https://issues.apache.org/jira/browse/KUDU-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hao Hao reassigned KUDU-1411: - Assignee: Hao Hao > Implement HT timestamp propagation in the Java client > - > > Key: KUDU-1411 > URL: https://issues.apache.org/jira/browse/KUDU-1411 > Project: Kudu > Issue Type: Improvement > Components: client >Reporter: Dan Burkert >Assignee: Hao Hao >Priority: Major > > The Java client doesn't do any timestamp propagation. The ScanToken > implementation should also be updated so that deserializing a ScanToken > results in a propagating the timestamp of the serializer into the > deserializer. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KUDU-1411) Implement HT timestamp propagation in the Java client
[ https://issues.apache.org/jira/browse/KUDU-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hao Hao updated KUDU-1411: -- Code Review: https://gerrit.cloudera.org/#/c/8452/ > Implement HT timestamp propagation in the Java client > - > > Key: KUDU-1411 > URL: https://issues.apache.org/jira/browse/KUDU-1411 > Project: Kudu > Issue Type: Improvement > Components: client >Reporter: Dan Burkert >Assignee: Hao Hao >Priority: Major > > The Java client doesn't do any timestamp propagation. The ScanToken > implementation should also be updated so that deserializing a ScanToken > results in a propagating the timestamp of the serializer into the > deserializer. -- This message was sent by Atlassian JIRA (v6.4.14#64029)