[jira] [Created] (KUDU-2208) Subprocess::Wait should handle EINTR

2017-11-01 Thread Todd Lipcon (JIRA)
Todd Lipcon created KUDU-2208:
-

 Summary: Subprocess::Wait should handle EINTR
 Key: KUDU-2208
 URL: https://issues.apache.org/jira/browse/KUDU-2208
 Project: Kudu
  Issue Type: Bug
  Components: test, util
Affects Versions: 1.6.0
Reporter: Todd Lipcon


I saw this test failure in a TSAN build: Bad status: Runtime error: Unable to 
wait() for /tmp/run_tha_testJyhcOT/build/tsan/bin/kudu: Unable to wait on 
child: Interrupted system call (error 4)

It seems like we need to add a RETRY_ON_EINTR loop to Subprocess::Wait



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KUDU-1809) ScanToken API does not respect batch size configuration

2017-11-01 Thread Hao Hao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao Hao updated KUDU-1809:
--
Code Review: https://gerrit.cloudera.org/#/c/8435/

> ScanToken API does not respect batch size configuration
> ---
>
> Key: KUDU-1809
> URL: https://issues.apache.org/jira/browse/KUDU-1809
> Project: Kudu
>  Issue Type: Bug
>  Components: client
>Reporter: Dan Burkert
>Assignee: Hao Hao
>Priority: Major
>  Labels: newbie
>
> Both implementations of the ScanToken API ignore the scan batch size 
> configuration option. The field is not even included in the ScanToken 
> protobuf message.  As a result, setting the option takes no effect on 
> deserialized scanners.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KUDU-2209) HybridClock doesn't handle changes STA_NANO status flag

2017-11-01 Thread Todd Lipcon (JIRA)
Todd Lipcon created KUDU-2209:
-

 Summary: HybridClock doesn't handle changes STA_NANO status flag
 Key: KUDU-2209
 URL: https://issues.apache.org/jira/browse/KUDU-2209
 Project: Kudu
  Issue Type: Bug
  Components: server
Affects Versions: 1.6.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical


Users have occasionally reported spurious crashes due to Kudu thinking that 
another node has a time stamp from the future. After some debugging I realized 
that the issue is that we currently capture the flag 'STA_NANO' from the kernel 
only at startup. This flag indicates whether the kernel's sub-second timestamp 
is in nanoseconds or microseconds. We initially assumed this was a static 
property of the kernel. However it turns out that this flag can get toggled at 
runtime by ntp in certain circumstances. Given this, it was possible for us to 
interpret a number of nanoseconds as if it were microseconds, resulting in a 
timestamp up to 1000 seconds in the future.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KUDU-1578) kudu-tserver should refuse service or "freeze" instead of crash when NTP loses sync

2017-11-01 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned KUDU-1578:
-

Assignee: Todd Lipcon

> kudu-tserver should refuse service or "freeze" instead of crash when NTP 
> loses sync
> ---
>
> Key: KUDU-1578
> URL: https://issues.apache.org/jira/browse/KUDU-1578
> Project: Kudu
>  Issue Type: Bug
>  Components: tserver
>Reporter: zhangsong
>Assignee: Todd Lipcon
>Priority: Major
>
> Currently, kudu-tserver will crash when ntp is unsynchronized.
> However this behavior maybe not the right in large cluster ,when crash can 
> lead to replicate which can be useless or harm to cluster availability.
> Instead, kudu-tserver should suspend it self like refusing to serve write , 
> let the administrator decide what to do.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KUDU-1578) kudu-tserver should refuse service or "freeze" instead of crash when NTP loses sync

2017-11-01 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16235089#comment-16235089
 ] 

Todd Lipcon commented on KUDU-1578:
---

I put up a patch at http://gerrit.cloudera.org:8080/8451 which partially 
addresses this.

In particular I didn't go through the complexity of trying to be "partially up" 
while NTP is down. Rather, I changed the clock to ride over brief periods of 
NTP synchronization loss, which logging errors to the log. Assuming typical 
configurations this should allow Kudu to stay up even if NTP goes out for tens 
of minutes.

> kudu-tserver should refuse service or "freeze" instead of crash when NTP 
> loses sync
> ---
>
> Key: KUDU-1578
> URL: https://issues.apache.org/jira/browse/KUDU-1578
> Project: Kudu
>  Issue Type: Bug
>  Components: tserver
>Reporter: zhangsong
>Assignee: Todd Lipcon
>Priority: Major
>
> Currently, kudu-tserver will crash when ntp is unsynchronized.
> However this behavior maybe not the right in large cluster ,when crash can 
> lead to replicate which can be useless or harm to cluster availability.
> Instead, kudu-tserver should suspend it self like refusing to serve write , 
> let the administrator decide what to do.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KUDU-2200) Sanity-check that users specify the right number of masters when connecting

2017-11-01 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved KUDU-2200.
---
   Resolution: Fixed
Fix Version/s: 1.6.0

> Sanity-check that users specify the right number of masters when connecting
> ---
>
> Key: KUDU-2200
> URL: https://issues.apache.org/jira/browse/KUDU-2200
> Project: Kudu
>  Issue Type: Improvement
>  Components: client, master, supportability
>Affects Versions: 1.6.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Major
> Fix For: 1.6.0
>
>
> A common issue I've seen is that users set up an HA master setup (3 masters) 
> but then in various cases only specify one of the masters when they try to 
> connect using the client. This currently will work if it happens that they 
> picked the leader master, and otherwise will return a s omewhat confusing "no 
> leader" error message.
> We should improve usability here by having the master send back a list of the 
> master addresses in  the case that it isn't the leader,  and the client can 
> use this to provide a more actionable error message like "Client connection 
> specified only a subset of the cluster's masters" or somesuch.
> I wouldn't want to  automatically reconfigure the client and reconnect 
> because this puts the client in a configuration state that will fail once the 
> one master they specified goes down.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KUDU-1411) Implement HT timestamp propagation in the Java client

2017-11-01 Thread Hao Hao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao Hao reassigned KUDU-1411:
-

Assignee: Hao Hao

> Implement HT timestamp propagation in the Java client
> -
>
> Key: KUDU-1411
> URL: https://issues.apache.org/jira/browse/KUDU-1411
> Project: Kudu
>  Issue Type: Improvement
>  Components: client
>Reporter: Dan Burkert
>Assignee: Hao Hao
>Priority: Major
>
> The Java client doesn't do any timestamp propagation.  The ScanToken 
> implementation should also be updated so that deserializing a ScanToken 
> results in a propagating the timestamp of the serializer into the 
> deserializer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KUDU-1411) Implement HT timestamp propagation in the Java client

2017-11-01 Thread Hao Hao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao Hao updated KUDU-1411:
--
Code Review: https://gerrit.cloudera.org/#/c/8452/

> Implement HT timestamp propagation in the Java client
> -
>
> Key: KUDU-1411
> URL: https://issues.apache.org/jira/browse/KUDU-1411
> Project: Kudu
>  Issue Type: Improvement
>  Components: client
>Reporter: Dan Burkert
>Assignee: Hao Hao
>Priority: Major
>
> The Java client doesn't do any timestamp propagation.  The ScanToken 
> implementation should also be updated so that deserializing a ScanToken 
> results in a propagating the timestamp of the serializer into the 
> deserializer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)