[jira] [Commented] (KUDU-2359) tserver should allow starting with a small number of missing data dirs

2018-04-06 Thread Andrew Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429019#comment-16429019
 ] 

Andrew Wong commented on KUDU-2359:
---

Clarifying this a bit based on some offline discussion with Adar, it's less 
that the file vanished; rather, our current code couldn't read anything from 
the instance file, and thus returned a "file not found" error. Looking at the 
file system with strace, we found that the EIO was triggered in getdents(). A 
snippet of the strace here:
{quote}{{ioctl(1, SNDCTL_TMR_TIMEBASE or 
SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, \{B38400 opost isig icanon echo ...}) 
= 0}}
{{ioctl(1, TIOCGWINSZ, \{ws_row=88, ws_col=357, ws_xpixel=0, 
ws_ypixel=0}) = 0}}
{{stat("/data/6", \{st_mode=S_IFDIR|S_ISVTX|0777, st_size=2048, ...}) = 0}}
{{open("/data/6", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3}}
{{fcntl(3, F_GETFD) = 0x1 (flags FD_CLOEXEC)}}
{{getdents(3, 0x8ecc90, 32768) = -1 EIO (Input/output error)}}
{{open("/usr/share/locale/locale.alias", O_RDONLY) = 4}}
{{fstat(4, \{st_mode=S_IFREG|0644, 
st_size=2512, ...}) = 0}}{quote}
We discussed a few options, considering potentially having more stringent 
checking around a mount point for failures (snooping around the file system for 
more info on failures), but settled on the point that, at least for start up, 
treating missing instance files as failed instance files would have the desired 
behavior.

The case for update_dirs is trickier, for the reasons mentioned above. One 
implementation we considered was to perhaps treat _all_ instances that returned 
errors upon loading as missing when running `kudu fs update_dirs`. As long as 
we don't do anything silly like prematurely overwrite files before knowing that 
the entire operation has completed, we _should_ be able to get away with this, 
since presumably the update will eventually fail at some point throughout the 
run of the tool. What we lose out on is, rather than short-circuiting if we see 
a disk failure, the update tool will attempt to do stuff (read, rewrite on 
other drives, etc.) because we're not sure whether we're "failed" or "missing" 
or whatever. We could have some heuristics like, "If we notice a failed 
instance, definitely do not try to update, but if we see a missing disk, try to 
update and if we can't because the disk has actually failed, revert everything" 
to make the semantics better, but for now I'll see how well this works.

> tserver should allow starting with a small number of missing data dirs
> --
>
> Key: KUDU-2359
> URL: https://issues.apache.org/jira/browse/KUDU-2359
> Project: Kudu
>  Issue Type: Improvement
>  Components: fs, tserver
>Reporter: Todd Lipcon
>Assignee: Andrew Wong
>Priority: Major
>
> Often when a disk fails, its mount point will not come back up when the 
> server is restarted. Currently, Kudu will respond to this by failing to 
> restart with an error like:
> F0314 18:23:39.353916 112051 tablet_server_main.cc:80] Check failed: _s.ok() 
> Bad status: Already present: FS layout already exists; not overwriting 
> existing layout. See 
> https://kudu.apache.org/releases/1.8.0-SNAPSHOT/docs/troubleshooting.html: 
> unable to create file system roots: FSManager roots already exist: 
> /data/1/kudu,/data/2/kudu,/data/3/kudu,/data/5/kudu,/data/6/kudu,/data/7/kudu,/data/8/kudu,/data/1/kudu-wal
> However, this defeats some of the advantages of the "allow single disk 
> failure" work. One could use the update_data_dirs tool to remove the missing 
> disk, but you'd also need to persistently change the configuration of the 
> daemon, which is hard to do with a consistent configuration management.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2359) tserver should allow starting with a small number of missing data dirs

2018-04-06 Thread Andrew Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428892#comment-16428892
 ] 

Andrew Wong commented on KUDU-2359:
---

Based on this, it probably makes sense to go about treating missing directories 
as "failed" directories (i.e. it should be marked "failed" in memory and all 
tablets configured to use it should be failed an re-replicated automatically). 
What does this mean for the `kudu fs update_dirs` tool, which mends missing 
directories? Its use would fall more on the side of fixing provisioning errors, 
rather than disk errors, and so it will be useful to keep around. That said, 
it'll take some thought on how to accommodate both missing directories as a 
"failed" state and missing directories as an expected state when running the 
tool.

> tserver should allow starting with a small number of missing data dirs
> --
>
> Key: KUDU-2359
> URL: https://issues.apache.org/jira/browse/KUDU-2359
> Project: Kudu
>  Issue Type: Improvement
>  Components: fs, tserver
>Reporter: Todd Lipcon
>Priority: Major
>
> Often when a disk fails, its mount point will not come back up when the 
> server is restarted. Currently, Kudu will respond to this by failing to 
> restart with an error like:
> F0314 18:23:39.353916 112051 tablet_server_main.cc:80] Check failed: _s.ok() 
> Bad status: Already present: FS layout already exists; not overwriting 
> existing layout. See 
> https://kudu.apache.org/releases/1.8.0-SNAPSHOT/docs/troubleshooting.html: 
> unable to create file system roots: FSManager roots already exist: 
> /data/1/kudu,/data/2/kudu,/data/3/kudu,/data/5/kudu,/data/6/kudu,/data/7/kudu,/data/8/kudu,/data/1/kudu-wal
> However, this defeats some of the advantages of the "allow single disk 
> failure" work. One could use the update_data_dirs tool to remove the missing 
> disk, but you'd also need to persistently change the configuration of the 
> daemon, which is hard to do with a consistent configuration management.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2359) tserver should allow starting with a small number of missing data dirs

2018-04-06 Thread Andrew Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428879#comment-16428879
 ] 

Andrew Wong commented on KUDU-2359:
---

I spent some time looking at a test cluster that had a few bad disks with the 
following behavior in their logs. On one of the servers, which had failed in 
Kudu 1.5 (pre-disk-failure handling), for some time following the failures, the 
server would attempt to start up and fail immediately with:

{{Fatal I/O error, context: /data/6/kudu/instance}}

After a few months of this (the server remaining down), the error changed:

{{Check failed: _s.ok() Bad status: Already present: Could not create new FS 
layout: FSManager root is not empty: /data/1/kudu}}

This message indicates that Kudu couldn't find an instance file for a data 
directory, and upon examining the FS a bit more, noticed this that 
/data/6/instance was indeed missing, but seemingly not because the disk was 
removed and replaced. Rather, it seemed that the instance file, after some time 
on the failed disk, vanished, and this is something that we need to consider.

{{cat: /data/6/kudu/instance: No such file or directory}}

{{ls: cannot access /data/6/kudu: No such file or directory}}

{{ls: reading directory /data/6: Input/output error}}

> tserver should allow starting with a small number of missing data dirs
> --
>
> Key: KUDU-2359
> URL: https://issues.apache.org/jira/browse/KUDU-2359
> Project: Kudu
>  Issue Type: Improvement
>  Components: fs, tserver
>Reporter: Todd Lipcon
>Priority: Major
>
> Often when a disk fails, its mount point will not come back up when the 
> server is restarted. Currently, Kudu will respond to this by failing to 
> restart with an error like:
> F0314 18:23:39.353916 112051 tablet_server_main.cc:80] Check failed: _s.ok() 
> Bad status: Already present: FS layout already exists; not overwriting 
> existing layout. See 
> https://kudu.apache.org/releases/1.8.0-SNAPSHOT/docs/troubleshooting.html: 
> unable to create file system roots: FSManager roots already exist: 
> /data/1/kudu,/data/2/kudu,/data/3/kudu,/data/5/kudu,/data/6/kudu,/data/7/kudu,/data/8/kudu,/data/1/kudu-wal
> However, this defeats some of the advantages of the "allow single disk 
> failure" work. One could use the update_data_dirs tool to remove the missing 
> disk, but you'd also need to persistently change the configuration of the 
> daemon, which is hard to do with a consistent configuration management.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2359) tserver should allow starting with a small number of missing data dirs

2018-03-28 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418154#comment-16418154
 ] 

Todd Lipcon commented on KUDU-2359:
---

I think the point is that, often times, after a server crash, things are 
configured to automatically reboot, and upon a reboot the Kudu daemon will 
automatically restart. So, there is no operator involvement to restart a 
crashed service. Or, a non-Kudu-expert operator knows enough to see that a 
tserver has crashed and restart the service, but isn't familiar enough to start 
modifying flags, etc. Additionally,  maintaining a separate set of flags on 
different daemons in a cluster gets complex.

bq. It also begs the question, would operators even care about those failed 
tablets? If our re-replication story is robust enough to handle everything on 
its own, it could be seen as a pointless configuration. I suppose exposing it 
as a flag initially would give us that sort of info.

right, I think in the common case, you want the server to come back, and then 
it'll notice the failed 25% of tablets, and re-replicate them elsewhere. 
Currently as it is, it's likely the server will be down for a day or two before 
the operator figures out the right way to run the 'update-dirs' tool, etc, and 
by that time when they get the server back up, everything has been 
re-replicated elsewhere already.

> tserver should allow starting with a small number of missing data dirs
> --
>
> Key: KUDU-2359
> URL: https://issues.apache.org/jira/browse/KUDU-2359
> Project: Kudu
>  Issue Type: Improvement
>  Components: fs, tserver
>Reporter: Todd Lipcon
>Priority: Major
>
> Often when a disk fails, its mount point will not come back up when the 
> server is restarted. Currently, Kudu will respond to this by failing to 
> restart with an error like:
> F0314 18:23:39.353916 112051 tablet_server_main.cc:80] Check failed: _s.ok() 
> Bad status: Already present: FS layout already exists; not overwriting 
> existing layout. See 
> https://kudu.apache.org/releases/1.8.0-SNAPSHOT/docs/troubleshooting.html: 
> unable to create file system roots: FSManager roots already exist: 
> /data/1/kudu,/data/2/kudu,/data/3/kudu,/data/5/kudu,/data/6/kudu,/data/7/kudu,/data/8/kudu,/data/1/kudu-wal
> However, this defeats some of the advantages of the "allow single disk 
> failure" work. One could use the update_data_dirs tool to remove the missing 
> disk, but you'd also need to persistently change the configuration of the 
> daemon, which is hard to do with a consistent configuration management.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2359) tserver should allow starting with a small number of missing data dirs

2018-03-26 Thread Andrew Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414249#comment-16414249
 ] 

Andrew Wong commented on KUDU-2359:
---

This should be doable by extending the architecture in place for the `kudu fs 
update_dirs` tool. The caveat here, and with the update tool, is that any 
tablets that are/were on the missing data directory are/should be started up in 
a failed state so they can be evicted and re-replicated elsewhere. For the 
update tool, we have operators confront this tradeoff by requiring them to 
specify the `–force` flag. Ideally a similar flag could be used here, so at 
least the mean time to recovery is gated by the time it takes to update a flag, 
rather than the time it takes to run `kudu fs update_dirs`.

It also begs the question, would operators even care about those failed 
tablets? If our re-replication story is robust enough to handle everything on 
its own, it could be seen as a pointless configuration. I suppose exposing it 
as a flag initially would give us that sort of info.

> tserver should allow starting with a small number of missing data dirs
> --
>
> Key: KUDU-2359
> URL: https://issues.apache.org/jira/browse/KUDU-2359
> Project: Kudu
>  Issue Type: Improvement
>  Components: fs, tserver
>Reporter: Todd Lipcon
>Priority: Major
>
> Often when a disk fails, its mount point will not come back up when the 
> server is restarted. Currently, Kudu will respond to this by failing to 
> restart with an error like:
> F0314 18:23:39.353916 112051 tablet_server_main.cc:80] Check failed: _s.ok() 
> Bad status: Already present: FS layout already exists; not overwriting 
> existing layout. See 
> https://kudu.apache.org/releases/1.8.0-SNAPSHOT/docs/troubleshooting.html: 
> unable to create file system roots: FSManager roots already exist: 
> /data/1/kudu,/data/2/kudu,/data/3/kudu,/data/5/kudu,/data/6/kudu,/data/7/kudu,/data/8/kudu,/data/1/kudu-wal
> However, this defeats some of the advantages of the "allow single disk 
> failure" work. One could use the update_data_dirs tool to remove the missing 
> disk, but you'd also need to persistently change the configuration of the 
> daemon, which is hard to do with a consistent configuration management.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2359) tserver should allow starting with a small number of missing data dirs

2018-03-19 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405844#comment-16405844
 ] 

Todd Lipcon commented on KUDU-2359:
---

cc [~anjuwong] for thoughts

> tserver should allow starting with a small number of missing data dirs
> --
>
> Key: KUDU-2359
> URL: https://issues.apache.org/jira/browse/KUDU-2359
> Project: Kudu
>  Issue Type: Improvement
>  Components: fs, tserver
>Reporter: Todd Lipcon
>Priority: Major
>
> Often when a disk fails, its mount point will not come back up when the 
> server is restarted. Currently, Kudu will respond to this by failing to 
> restart with an error like:
> F0314 18:23:39.353916 112051 tablet_server_main.cc:80] Check failed: _s.ok() 
> Bad status: Already present: FS layout already exists; not overwriting 
> existing layout. See 
> https://kudu.apache.org/releases/1.8.0-SNAPSHOT/docs/troubleshooting.html: 
> unable to create file system roots: FSManager roots already exist: 
> /data/1/kudu,/data/2/kudu,/data/3/kudu,/data/5/kudu,/data/6/kudu,/data/7/kudu,/data/8/kudu,/data/1/kudu-wal
> However, this defeats some of the advantages of the "allow single disk 
> failure" work. One could use the update_data_dirs tool to remove the missing 
> disk, but you'd also need to persistently change the configuration of the 
> daemon, which is hard to do with a consistent configuration management.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)