Re: [PATCH 2/4] btrfs-progs: Integrate error message output into find_mount_root().
Hi Qu, (2014/07/10 12:05), Qu Wenruo wrote: Before this patch, find_mount_root() and the caller both output error message, which sometimes make the output duplicated and hard to judge what the problem is. This pathh will integrate all the error messages output into find_mount_root() to give more meaning error prompt and remove the unneeded caller error messages. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- cmds-receive.c | 2 -- cmds-send.c | 8 +--- cmds-subvolume.c | 5 + utils.c | 15 --- 4 files changed, 14 insertions(+), 16 deletions(-) diff --git a/cmds-receive.c b/cmds-receive.c index 48380a5..084d97d 100644 --- a/cmds-receive.c +++ b/cmds-receive.c @@ -981,8 +981,6 @@ static int do_receive(struct btrfs_receive *r, const char *tomnt, int r_fd, ret = find_mount_root(dest_dir_full_path, r-root_path); if (ret 0) { ret = -EINVAL; - fprintf(stderr, ERROR: failed to determine mount point - for %s\n, dest_dir_full_path); goto out; } r-mnt_fd = open(r-root_path, O_RDONLY | O_NOATIME); diff --git a/cmds-send.c b/cmds-send.c index 9a73b32..091f32b 100644 --- a/cmds-send.c +++ b/cmds-send.c @@ -357,8 +357,6 @@ static int init_root_path(struct btrfs_send *s, const char *subvol) ret = find_mount_root(subvol, s-root_path); if (ret 0) { ret = -EINVAL; - fprintf(stderr, ERROR: failed to determine mount point - for %s\n, subvol); goto out; } @@ -622,12 +620,8 @@ int cmd_send(int argc, char **argv) } ret = find_mount_root(subvol, mount_root); - if (ret 0) { - fprintf(stderr, ERROR: find_mount_root failed on %s: - %s\n, subvol, - strerror(-ret)); + if (ret 0) goto out; - } if (strcmp(send.root_path, mount_root) != 0) { ret = -EINVAL; fprintf(stderr, ERROR: all subvols must be from the diff --git a/cmds-subvolume.c b/cmds-subvolume.c index 639fb10..b252eab 100644 --- a/cmds-subvolume.c +++ b/cmds-subvolume.c @@ -981,11 +981,8 @@ static int cmd_subvol_show(int argc, char **argv) } ret = find_mount_root(fullpath, mnt); - if (ret 0) { - fprintf(stderr, ERROR: find_mount_root failed on %s: - %s\n, fullpath, strerror(-ret)); + if (ret 0) goto out; - } ret = 1; svpath = get_subvol_name(mnt, fullpath); diff --git a/utils.c b/utils.c index 507ec6c..07173ee 100644 --- a/utils.c +++ b/utils.c @@ -2417,13 +2417,19 @@ int find_mount_root(const char *path, char **mount_root) char *longest_match = NULL; fd = open(path, O_RDONLY | O_NOATIME); - if (fd 0) + if (fd 0) { + fprintf(stderr, ERROR: Failed to open %s: %s\n, + path, strerror(errno)); It drops part of original messages. It doesn't show this error is from find_mount_root(). I consider the original meaning keep as is. How do you think? Thanks, Satoru return -errno; + } close(fd); mnttab = setmntent(/proc/self/mounts, r); - if (!mnttab) + if (!mnttab) { + fprintf(stderr, ERROR: Failed to setmntent: %s\n, + strerror(errno)); return -errno; + } while ((ent = getmntent(mnttab))) { len = strlen(ent-mnt_dir); @@ -2457,8 +2463,11 @@ int find_mount_root(const char *path, char **mount_root) ret = 0; *mount_root = realpath(longest_match, NULL); - if (!*mount_root) + if (!*mount_root) { + fprintf(stderr, Failed to resolve path %s: %s\n, + longest_match, strerror(errno)); ret = -errno; + } free(longest_match); return ret; -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] btrfs-progs: Fix wrong indent in btrfs-progs.
(2014/07/10 12:05), Qu Wenruo wrote: When editing cmds-filesystem.c, I found cmd_filesystem_df() uses 7 spaces as indent instead of 1 tab (or 8 spaces). which makes indent quite embarrassing. Such problem is especillay hard to detect when reviewing patches, since the leading '+' makes a tab only 7 spaces long, makeing 7 spaces look the same with a tab. This patch fixes all the 7 spaces indent. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com Reviewed-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com --- cmds-filesystem.c | 79 +++ ctree.h | 15 ++- utils.c | 10 +++ 3 files changed, 52 insertions(+), 52 deletions(-) diff --git a/cmds-filesystem.c b/cmds-filesystem.c index 4b2d27e..0a9b62a 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -114,23 +114,23 @@ static const char * const filesystem_cmd_group_usage[] = { }; static const char * const cmd_filesystem_df_usage[] = { - btrfs filesystem df path, - Show space usage information for a mount point, - NULL + btrfs filesystem df path, + Show space usage information for a mount point, + NULL }; static void print_df(struct btrfs_ioctl_space_args *sargs) { - u64 i; - struct btrfs_ioctl_space_info *sp = sargs-spaces; - - for (i = 0; i sargs-total_spaces; i++, sp++) { - printf(%s, %s: total=%s, used=%s\n, - group_type_str(sp-flags), - group_profile_str(sp-flags), - pretty_size(sp-total_bytes), - pretty_size(sp-used_bytes)); - } + u64 i; + struct btrfs_ioctl_space_info *sp = sargs-spaces; + + for (i = 0; i sargs-total_spaces; i++, sp++) { + printf(%s, %s: total=%s, used=%s\n, +group_type_str(sp-flags), +group_profile_str(sp-flags), +pretty_size(sp-total_bytes), +pretty_size(sp-used_bytes)); + } } static int get_df(int fd, struct btrfs_ioctl_space_args **sargs_ret) @@ -183,33 +183,32 @@ static int get_df(int fd, struct btrfs_ioctl_space_args **sargs_ret) static int cmd_filesystem_df(int argc, char **argv) { - struct btrfs_ioctl_space_args *sargs = NULL; - int ret; - int fd; - char *path; - DIR *dirstream = NULL; - - if (check_argc_exact(argc, 2)) - usage(cmd_filesystem_df_usage); - - path = argv[1]; - - fd = open_file_or_dir(path, dirstream); - if (fd 0) { - fprintf(stderr, ERROR: can't access '%s'\n, path); - return 1; - } - ret = get_df(fd, sargs); - - if (!ret sargs) { - print_df(sargs); - free(sargs); - } else { - fprintf(stderr, ERROR: get_df failed %s\n, strerror(-ret)); - } - - close_file_or_dir(fd, dirstream); - return !!ret; + struct btrfs_ioctl_space_args *sargs = NULL; + int ret; + int fd; + char *path; + DIR *dirstream = NULL; + + if (check_argc_exact(argc, 2)) + usage(cmd_filesystem_df_usage); + + path = argv[1]; + + fd = open_file_or_dir(path, dirstream); + if (fd 0) { + fprintf(stderr, ERROR: can't access '%s'\n, path); + return 1; + } + ret = get_df(fd, sargs); + if (!ret sargs) { + print_df(sargs); + free(sargs); + } else { + fprintf(stderr, ERROR: get_df failed %s\n, strerror(-ret)); + } + + close_file_or_dir(fd, dirstream); + return !!ret; } static int match_search_item_kernel(__u8 *fsid, char *mnt, char *label, diff --git a/ctree.h b/ctree.h index 35d3633..83d85b3 100644 --- a/ctree.h +++ b/ctree.h @@ -939,10 +939,10 @@ struct btrfs_block_group_cache { }; struct btrfs_extent_ops { - int (*alloc_extent)(struct btrfs_root *root, u64 num_bytes, -u64 hint_byte, struct btrfs_key *ins); - int (*free_extent)(struct btrfs_root *root, u64 bytenr, - u64 num_bytes); + int (*alloc_extent)(struct btrfs_root *root, u64 num_bytes, + u64 hint_byte, struct btrfs_key *ins); + int (*free_extent)(struct btrfs_root *root, u64 bytenr, +u64 num_bytes); }; struct btrfs_device; @@ -2117,9 +2117,10 @@ BTRFS_SETGET_STACK_FUNCS(stack_qgroup_limit_rsv_exclusive, static inline u32 btrfs_file_extent_inline_item_len(struct extent_buffer *eb, struct btrfs_item *e) { - unsigned long offset; - offset = offsetof(struct btrfs_file_extent_item, disk_bytenr); - return btrfs_item_size(eb, e) - offset; + unsigned long
Re: [PATCH 2/4] btrfs-progs: Integrate error message output into find_mount_root().
Takeuchi-san On Thu, 10 Jul 2014 16:33:23 +0900, Satoru Takeuchi wrote: (2014/07/10 12:05), Qu Wenruo wrote: Before this patch, find_mount_root() and the caller both output error message, which sometimes make the output duplicated and hard to judge what the problem is. This pathh will integrate all the error messages output into find_mount_root() to give more meaning error prompt and remove the unneeded caller error messages. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- cmds-receive.c | 2 -- cmds-send.c | 8 +--- cmds-subvolume.c | 5 + utils.c | 15 --- 4 files changed, 14 insertions(+), 16 deletions(-) diff --git a/cmds-receive.c b/cmds-receive.c index 48380a5..084d97d 100644 --- a/cmds-receive.c +++ b/cmds-receive.c @@ -981,8 +981,6 @@ static int do_receive(struct btrfs_receive *r, const char *tomnt, int r_fd, ret = find_mount_root(dest_dir_full_path, r-root_path); if (ret 0) { ret = -EINVAL; -fprintf(stderr, ERROR: failed to determine mount point -for %s\n, dest_dir_full_path); goto out; } r-mnt_fd = open(r-root_path, O_RDONLY | O_NOATIME); diff --git a/cmds-send.c b/cmds-send.c index 9a73b32..091f32b 100644 --- a/cmds-send.c +++ b/cmds-send.c @@ -357,8 +357,6 @@ static int init_root_path(struct btrfs_send *s, const char *subvol) ret = find_mount_root(subvol, s-root_path); if (ret 0) { ret = -EINVAL; -fprintf(stderr, ERROR: failed to determine mount point -for %s\n, subvol); goto out; } @@ -622,12 +620,8 @@ int cmd_send(int argc, char **argv) } ret = find_mount_root(subvol, mount_root); -if (ret 0) { -fprintf(stderr, ERROR: find_mount_root failed on %s: -%s\n, subvol, -strerror(-ret)); +if (ret 0) goto out; -} if (strcmp(send.root_path, mount_root) != 0) { ret = -EINVAL; fprintf(stderr, ERROR: all subvols must be from the diff --git a/cmds-subvolume.c b/cmds-subvolume.c index 639fb10..b252eab 100644 --- a/cmds-subvolume.c +++ b/cmds-subvolume.c @@ -981,11 +981,8 @@ static int cmd_subvol_show(int argc, char **argv) } ret = find_mount_root(fullpath, mnt); -if (ret 0) { -fprintf(stderr, ERROR: find_mount_root failed on %s: -%s\n, fullpath, strerror(-ret)); +if (ret 0) goto out; -} ret = 1; svpath = get_subvol_name(mnt, fullpath); diff --git a/utils.c b/utils.c index 507ec6c..07173ee 100644 --- a/utils.c +++ b/utils.c @@ -2417,13 +2417,19 @@ int find_mount_root(const char *path, char **mount_root) char *longest_match = NULL; fd = open(path, O_RDONLY | O_NOATIME); -if (fd 0) +if (fd 0) { +fprintf(stderr, ERROR: Failed to open %s: %s\n, +path, strerror(errno)); It drops part of original messages. It doesn't show this error is from find_mount_root(). I consider the original meaning keep as is. How do you think? I think it is strange for the common users to show the name of a internal function. Maybe we should introduce two kinds of the message, one is for the common users, the other is for the developers to debug. Thanks Miao Thanks, Satoru return -errno; +} close(fd); mnttab = setmntent(/proc/self/mounts, r); -if (!mnttab) +if (!mnttab) { +fprintf(stderr, ERROR: Failed to setmntent: %s\n, +strerror(errno)); return -errno; +} while ((ent = getmntent(mnttab))) { len = strlen(ent-mnt_dir); @@ -2457,8 +2463,11 @@ int find_mount_root(const char *path, char **mount_root) ret = 0; *mount_root = realpath(longest_match, NULL); -if (!*mount_root) +if (!*mount_root) { +fprintf(stderr, Failed to resolve path %s: %s\n, +longest_match, strerror(errno)); ret = -errno; +} free(longest_match); return ret; -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] btrfs-progs: Integrate error message output into find_mount_root().
Original Message Subject: Re: [PATCH 2/4] btrfs-progs: Integrate error message output into find_mount_root(). From: Miao Xie mi...@cn.fujitsu.com To: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com, Qu Wenruo quwen...@cn.fujitsu.com, linux-btrfs@vger.kernel.org Date: 2014年07月10日 16:10 Takeuchi-san On Thu, 10 Jul 2014 16:33:23 +0900, Satoru Takeuchi wrote: (2014/07/10 12:05), Qu Wenruo wrote: Before this patch, find_mount_root() and the caller both output error message, which sometimes make the output duplicated and hard to judge what the problem is. This pathh will integrate all the error messages output into find_mount_root() to give more meaning error prompt and remove the unneeded caller error messages. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- cmds-receive.c | 2 -- cmds-send.c | 8 +--- cmds-subvolume.c | 5 + utils.c | 15 --- 4 files changed, 14 insertions(+), 16 deletions(-) diff --git a/cmds-receive.c b/cmds-receive.c index 48380a5..084d97d 100644 --- a/cmds-receive.c +++ b/cmds-receive.c @@ -981,8 +981,6 @@ static int do_receive(struct btrfs_receive *r, const char *tomnt, int r_fd, ret = find_mount_root(dest_dir_full_path, r-root_path); if (ret 0) { ret = -EINVAL; - fprintf(stderr, ERROR: failed to determine mount point - for %s\n, dest_dir_full_path); goto out; } r-mnt_fd = open(r-root_path, O_RDONLY | O_NOATIME); diff --git a/cmds-send.c b/cmds-send.c index 9a73b32..091f32b 100644 --- a/cmds-send.c +++ b/cmds-send.c @@ -357,8 +357,6 @@ static int init_root_path(struct btrfs_send *s, const char *subvol) ret = find_mount_root(subvol, s-root_path); if (ret 0) { ret = -EINVAL; - fprintf(stderr, ERROR: failed to determine mount point - for %s\n, subvol); goto out; } @@ -622,12 +620,8 @@ int cmd_send(int argc, char **argv) } ret = find_mount_root(subvol, mount_root); - if (ret 0) { - fprintf(stderr, ERROR: find_mount_root failed on %s: - %s\n, subvol, - strerror(-ret)); + if (ret 0) goto out; - } if (strcmp(send.root_path, mount_root) != 0) { ret = -EINVAL; fprintf(stderr, ERROR: all subvols must be from the diff --git a/cmds-subvolume.c b/cmds-subvolume.c index 639fb10..b252eab 100644 --- a/cmds-subvolume.c +++ b/cmds-subvolume.c @@ -981,11 +981,8 @@ static int cmd_subvol_show(int argc, char **argv) } ret = find_mount_root(fullpath, mnt); - if (ret 0) { - fprintf(stderr, ERROR: find_mount_root failed on %s: - %s\n, fullpath, strerror(-ret)); + if (ret 0) goto out; - } ret = 1; svpath = get_subvol_name(mnt, fullpath); diff --git a/utils.c b/utils.c index 507ec6c..07173ee 100644 --- a/utils.c +++ b/utils.c @@ -2417,13 +2417,19 @@ int find_mount_root(const char *path, char **mount_root) char *longest_match = NULL; fd = open(path, O_RDONLY | O_NOATIME); - if (fd 0) + if (fd 0) { + fprintf(stderr, ERROR: Failed to open %s: %s\n, + path, strerror(errno)); It drops part of original messages. It doesn't show this error is from find_mount_root(). I consider the original meaning keep as is. How do you think? I think it is strange for the common users to show the name of a internal function. Maybe we should introduce two kinds of the message, one is for the common users, the other is for the developers to debug. Thanks Miao I agree with Miao's idea. It's true that some developers needs to get info from the output, but IMO the error messages are often used to indicate what *users* do wrong, since most problem is caused by wrong parameter given by users. For example, I always forget to run 'btrfs fi df /mnt' and the 'Operation not permiited' message makes me realize the permission problem. And the function name or other messages are less important than that. On the other hand, if developers encounter problems, they will gdb the program or grep the source to find out the problem. So function name in error message seems not so demanding for me. It would also be a greate idea for adding new frame work to show debug message, but I'd prefer to make the frame some times later(Maybe when btrfs-progs become more comlicated than current?) Thanks, Qu Thanks, Satoru return -errno; + } close(fd); mnttab = setmntent(/proc/self/mounts, r); - if (!mnttab) + if (!mnttab) { + fprintf(stderr, ERROR: Failed to
Re: btrfs RAID with enterprise SATA or SAS drives
Am Donnerstag, 10. Juli 2014, 12:10:46 schrieb Russell Coker: On Wed, 9 Jul 2014 16:48:05 Martin Steigerwald wrote: - for someone using SAS or enterprise SATA drives with Linux, I understand btrfs gives the extra benefit of checksums, are there any other specific benefits over using mdadm or dmraid? I think I can answer this one. Most important advantage I think is BTRFS is aware of which blocks of the RAID are in use and need to be synced: - Instant initialization of RAID regardless of size (unless at some capacity mkfs.btrfs needs more time) From mdadm(8): --assume-clean Tell mdadm that the array pre-existed and is known to be clean. It can be useful when trying to recover from a major failure as you can be sure that no data will be affected unless you actu‐ ally write to the array. It can also be used when creating a RAID1 or RAID10 if you want to avoid the initial resync, however this practice — while normally safe — is not recommended. Use this only if you really know what you are doing. When the devices that will be part of a new array were filled with zeros before creation the operator knows the array is actu‐ ally clean. If that is the case, such as after running bad‐ blocks, this argument can be used to tell mdadm the facts the operator knows. While it might be regarded as a hack, it is possible to do a fairly instant initialisation of a Linux software RAID-1. It is not the same. BTRFS doesn´t care if the data of the unused blocks differ. The RAID is on *filesystem* level, not on raw block level. The data on both disks don´t even have to be located in the exact same sectors. - Rebuild after disk failure or disk replace will only copy *used* blocks Have you done any benchmarks on this? The down-side of copying used blocks is that you first need to discover which blocks are used. Given that seek time is a major bottleneck at some portion of space used it will be faster to just copy the entire disk. As BTRFS operates the RAID on the filesystem level it already knows which blocks are in use. I never had a disk replace or faulty disk yet in my two RAID-1 arrays, so I have no measurements. It may depend on free space fragementation. Scrubbing can repair from good disk if RAID with redundancy, but SoftRAID should be able to do this as well. But also for scrubbing: BTRFS only check and repairs used blocks. When you scrub Linux Software RAID (and in fact pretty much every RAID) it will only correct errors that the disks flag. If a disk returns bad data and says that it's good then the RAID scrub will happily copy the bad data over the good data (for a RAID-1) or generate new valid parity blocks for bad data (for RAID-5/6). http://research.cs.wisc.edu/adsl/Publications/corruption-fast08.html Page 12 of the above document says that nearline disks (IE the ones people like me can afford for home use) have a 0.466% incidence of returning bad data and claiming it's good in a year. Currently I run about 20 such disks in a variety of servers, workstations, and laptops. Therefore the probability of having no such errors on all those disks would be .99534^20=.91081. The probability of having no such errors over a period of 10 years would be (.99534^20)^10=.39290 which means that over 10 years I should expect to have such errors, which is why BTRFS RAID-1 and DUP metadata on single disks are necessary features. Yeah, the checksums comes in handy here. (excuse long signature, its added by server) Ciao, -- Martin Steigerwald Consultant / Trainer teamix GmbH Südwestpark 43 90449 Nürnberg fon: +49 911 30999 55 fax: +49 911 30999 99 mail: martin.steigerw...@teamix.de web: http://www.teamix.de blog: http://blog.teamix.de Amtsgericht Nürnberg, HRB 18320 Geschäftsführer: Oliver Kügow, Richard Müller ** JETZT ANMELDEN – teamix TechDemo - 23.07.2014 - http://www.teamix.de/techdemo ** -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs RAID with enterprise SATA or SAS drives
On 2014-07-09 22:10, Russell Coker wrote: On Wed, 9 Jul 2014 16:48:05 Martin Steigerwald wrote: - for someone using SAS or enterprise SATA drives with Linux, I understand btrfs gives the extra benefit of checksums, are there any other specific benefits over using mdadm or dmraid? I think I can answer this one. Most important advantage I think is BTRFS is aware of which blocks of the RAID are in use and need to be synced: - Instant initialization of RAID regardless of size (unless at some capacity mkfs.btrfs needs more time) From mdadm(8): --assume-clean Tell mdadm that the array pre-existed and is known to be clean. It can be useful when trying to recover from a major failure as you can be sure that no data will be affected unless you actu‐ ally write to the array. It can also be used when creating a RAID1 or RAID10 if you want to avoid the initial resync, however this practice — while normally safe — is not recommended. Use this only if you really know what you are doing. When the devices that will be part of a new array were filled with zeros before creation the operator knows the array is actu‐ ally clean. If that is the case, such as after running bad‐ blocks, this argument can be used to tell mdadm the facts the operator knows. While it might be regarded as a hack, it is possible to do a fairly instant initialisation of a Linux software RAID-1. This has the notable disadvantage however that the first scrub you run will essentially preform a full resync if you didn't make sure that the disks had identical data to begin with. - Rebuild after disk failure or disk replace will only copy *used* blocks Have you done any benchmarks on this? The down-side of copying used blocks is that you first need to discover which blocks are used. Given that seek time is a major bottleneck at some portion of space used it will be faster to just copy the entire disk. I haven't done any tests on BTRFS in this regard, but I've seen a disk replacement on ZFS run significantly slower than a dd of the block device would. First of all, this isn't really a good comparison for two reasons: 1. EVERYTHING on ZFS (or any filesystem that tries to do that much work) is slower than a dd of the raw block device. 2. Even if the throughput is lower, this is only really an issue if the disk is more than half full, because you don't copy the unused blocks Also, while it isn't really a recovery situation, I recently upgraded from a 2 1TB disk BTRFS RAID1 setup to a 4 1TB disk BTRFS RAID10 setup, and the performance of the re-balance really wasn't all that bad. I have maybe 100GB of actual data, so the array started out roughly 10% full, and the re-balance only took about 2 minutes. Of course, it probably helps that I make a point to keep my filesystems de-fragmented, scrub and balance regularly, and don't use a lot of sub-volumes or snapshots, so the filesystem in question is not too different from what it would have looked like if I had just wiped the FS and restored from a backup. Scrubbing can repair from good disk if RAID with redundancy, but SoftRAID should be able to do this as well. But also for scrubbing: BTRFS only check and repairs used blocks. When you scrub Linux Software RAID (and in fact pretty much every RAID) it will only correct errors that the disks flag. If a disk returns bad data and says that it's good then the RAID scrub will happily copy the bad data over the good data (for a RAID-1) or generate new valid parity blocks for bad data (for RAID-5/6). http://research.cs.wisc.edu/adsl/Publications/corruption-fast08.html Page 12 of the above document says that nearline disks (IE the ones people like me can afford for home use) have a 0.466% incidence of returning bad data and claiming it's good in a year. Currently I run about 20 such disks in a variety of servers, workstations, and laptops. Therefore the probability of having no such errors on all those disks would be .99534^20=.91081. The probability of having no such errors over a period of 10 years would be (.99534^20)^10=.39290 which means that over 10 years I should expect to have such errors, which is why BTRFS RAID-1 and DUP metadata on single disks are necessary features. smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH RESEND 1/4] btrfs-progs: Check fstype in find_mount_root()
Am Donnerstag, 10. Juli 2014, 11:05:10 schrieb Qu Wenruo: When calling find_mount_root(), caller in fact wants to find the mount point of *BTRFS*. So also check ent-fstype in find_mount_root() and output proper error messages if needed. This will suppress a lot of Inapproiate ioctl for device error message. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- utils.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/utils.c b/utils.c index 993d085..507ec6c 100644 --- a/utils.c +++ b/utils.c @@ -2412,6 +2412,7 @@ int find_mount_root(const char *path, char **mount_root) struct mntent *ent; int len; int ret; + int not_btrfs; int longest_matchlen = 0; char *longest_match = NULL; @@ -2432,6 +2433,10 @@ int find_mount_root(const char *path, char **mount_root) free(longest_match); longest_matchlen = len; longest_match = strdup(ent-mnt_dir); + if (strcmp(ent-mnt_type, btrfs)) + not_btrfs = 1; + else + not_btrfs = 0; } } } @@ -2443,6 +2448,12 @@ int find_mount_root(const char *path, char **mount_root) path); return -ENOENT; } + if (not_btrfs) { + fprintf(stderr, + ERROR: %s does not belong to a btrfs mount points.\n, Just a typo: mount point + path); + return -EINVAL; + } ret = 0; *mount_root = realpath(longest_match, NULL); Thanks, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: File server structure suggestion
On 07/10/2014 04:41 PM, Andrew Flerchinger wrote: what was going on. That sold me on the idea of data checksums, but I'd rather stay in linux than BSD, and I previously made use of online capacity expansion as needed, which ZFS doesn't support. What do you mean by that? What zfs doesn't support is reducing a pool. tamas -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: File server structure suggestion
I want to increase the size of the vdev, not just the zpool. I want to make a 3-drive array into a 4-drive array by adding a single drive while still having one parity stripe across all data. Adding more vdevs to a zpool isn't quite the same thing as online capacity expansion. It's not something most businesses would do, which is why the feature never made it into ZFS, but consumers are cheap and mdadm supported it. From what I've read, btrfs supports it, too. On Thu, Jul 10, 2014 at 10:52 AM, Tamas Papp tom...@martos.bme.hu wrote: On 07/10/2014 04:41 PM, Andrew Flerchinger wrote: what was going on. That sold me on the idea of data checksums, but I'd rather stay in linux than BSD, and I previously made use of online capacity expansion as needed, which ZFS doesn't support. What do you mean by that? What zfs doesn't support is reducing a pool. tamas -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfstests/btrfs: add test for quota groups and drop snapshot
Hey Dave, thanks for the patch review! Pretty much all of what you wrote sounds good to me, there's just one or two items I wanted to clarify - those comments are inline. Thanks again, On Thu, Jul 10, 2014 at 10:43:30AM +1000, Dave Chinner wrote: On Wed, Jul 09, 2014 at 03:41:50PM -0700, Mark Fasheh wrote: + +# Enable qgroups now that we have our filesystem prepared. This +# will kick off a scan which we will have to wait for below. +$BTRFS_UTIL_PROG qu en $SCRATCH_MNT +sleep 30 That seems rather arbitrary. The sleeps you are adding add well over a minute to the runtime, and a quota scan of a filesystem with 200 files should be almost instantenous. Yeah I'll bring that back down to 5 seconds? It's 30 from my testing because I was being paranoid and neglected to update it for the rest of the world. +_scratch_unmount +_scratch_mount What is the purpose of this? This is kind of 'maximum paranoia' again from my own test script. The idea was to make _absolutely_ certain that all metadata found it's way to disk and won't be optimized in layout any more. There's a decent chance it doesn't do anything but it doesn't seem a huge deal. I wasn't clear though - do you want it removed or can I comment it for clarity? +# Ok, delete the snapshot we made previously. Since btrfs drop +# snapshot is a delayed action with no way to force it, we have to +# impose another sleep here. +$BTRFS_UTIL_PROG su de $SCRATCH_MNT/snap1 +sleep 45 That's indicative of a bug, yes? No, that's just how it happens. In fact, if you unmount while a snapshot is being dropped, progress of the drop will be recorded and it will be continued on next mount. However, since we *must* have the drop_snapshot complete for this test I have the large sleep. Unlike the previous sleep I don't think this can be reduced by much :( --Mark -- Mark Fasheh -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Question] disk_bytenr with multiple devices
When a btrfs has multiple devices (e.g. /dev/sdb, /dev/sdc), how should I interpret disk_bytenr in btrfs_file_extent_item? Does it depend on the striping config? Say I used raid0, then disk_bytenr 0~64K will be on /dev/sdb, and 64K~128K on /dev/sdc? Thanks, Zhe -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfstests/btrfs: add test for quota groups and drop snapshot
On Thu, Jul 10, 2014 at 10:36:14AM -0700, Mark Fasheh wrote: On Thu, Jul 10, 2014 at 10:43:30AM +1000, Dave Chinner wrote: On Wed, Jul 09, 2014 at 03:41:50PM -0700, Mark Fasheh wrote: + +# Enable qgroups now that we have our filesystem prepared. This +# will kick off a scan which we will have to wait for below. +$BTRFS_UTIL_PROG qu en $SCRATCH_MNT +sleep 30 That seems rather arbitrary. The sleeps you are adding add well over a minute to the runtime, and a quota scan of a filesystem with 200 files should be almost instantenous. Yeah I'll bring that back down to 5 seconds? How long does it usually take? What interfaces would be needed for this to work precisely so we don't have to play this game ever again? - z -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfstests/btrfs: add test for quota groups and drop snapshot
On Thu, Jul 10, 2014 at 11:32:28AM -0700, Zach Brown wrote: On Thu, Jul 10, 2014 at 10:36:14AM -0700, Mark Fasheh wrote: On Thu, Jul 10, 2014 at 10:43:30AM +1000, Dave Chinner wrote: On Wed, Jul 09, 2014 at 03:41:50PM -0700, Mark Fasheh wrote: + +# Enable qgroups now that we have our filesystem prepared. This +# will kick off a scan which we will have to wait for below. +$BTRFS_UTIL_PROG qu en $SCRATCH_MNT +sleep 30 That seems rather arbitrary. The sleeps you are adding add well over a minute to the runtime, and a quota scan of a filesystem with 200 files should be almost instantenous. Yeah I'll bring that back down to 5 seconds? How long does it usually take? What interfaces would be needed for this to work precisely so we don't have to play this game ever again? Well there's also the 'sleep 45' below because we need to be certain that btrfs_drop_snapshot gets run. This was all a bit of a pain during debugging to be honest. So in my experience, an interface to make debugging easier would involve running every delayed action in the file system to completion, including a sync of dirty blocks to disk. In theory, this would include any delayed actions that were kicked off as a result of the actions you are syncing. You'd do it all from a point in time of course so that we don't spin forever on a busy filesystem. I do not know whether this is feasible. Given something like that, you'd just replace the calls to sleep with 'btrfs fi synctheworldandwait' and know that on return, the actions you just queued up completed. --Mark -- Mark Fasheh -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfstests/btrfs: add test for quota groups and drop snapshot
On Thu, Jul 10, 2014 at 12:00:55PM -0700, Mark Fasheh wrote: On Thu, Jul 10, 2014 at 11:32:28AM -0700, Zach Brown wrote: On Thu, Jul 10, 2014 at 10:36:14AM -0700, Mark Fasheh wrote: On Thu, Jul 10, 2014 at 10:43:30AM +1000, Dave Chinner wrote: On Wed, Jul 09, 2014 at 03:41:50PM -0700, Mark Fasheh wrote: + +# Enable qgroups now that we have our filesystem prepared. This +# will kick off a scan which we will have to wait for below. +$BTRFS_UTIL_PROG qu en $SCRATCH_MNT +sleep 30 That seems rather arbitrary. The sleeps you are adding add well over a minute to the runtime, and a quota scan of a filesystem with 200 files should be almost instantenous. Yeah I'll bring that back down to 5 seconds? How long does it usually take? What interfaces would be needed for this to work precisely so we don't have to play this game ever again? Well there's also the 'sleep 45' below because we need to be certain that btrfs_drop_snapshot gets run. This was all a bit of a pain during debugging to be honest. Yeah. It seems like there's an opportunity for sync flags in the commands. - z -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfstests/btrfs: add test for quota groups and drop snapshot
On Thu, Jul 10, 2014 at 12:05:05PM -0700, Zach Brown wrote: On Thu, Jul 10, 2014 at 12:00:55PM -0700, Mark Fasheh wrote: On Thu, Jul 10, 2014 at 11:32:28AM -0700, Zach Brown wrote: On Thu, Jul 10, 2014 at 10:36:14AM -0700, Mark Fasheh wrote: On Thu, Jul 10, 2014 at 10:43:30AM +1000, Dave Chinner wrote: On Wed, Jul 09, 2014 at 03:41:50PM -0700, Mark Fasheh wrote: + +# Enable qgroups now that we have our filesystem prepared. This +# will kick off a scan which we will have to wait for below. +$BTRFS_UTIL_PROG qu en $SCRATCH_MNT +sleep 30 That seems rather arbitrary. The sleeps you are adding add well over a minute to the runtime, and a quota scan of a filesystem with 200 files should be almost instantenous. Yeah I'll bring that back down to 5 seconds? How long does it usually take? What interfaces would be needed for this to work precisely so we don't have to play this game ever again? Well there's also the 'sleep 45' below because we need to be certain that btrfs_drop_snapshot gets run. This was all a bit of a pain during debugging to be honest. Yeah. It seems like there's an opportunity for sync flags in the commands. Yep, that would've helped. --Mark -- Mark Fasheh -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfstests/btrfs: add test for quota groups and drop snapshot
On Thu, Jul 10, 2014 at 10:36:14AM -0700, Mark Fasheh wrote: Hey Dave, thanks for the patch review! Pretty much all of what you wrote sounds good to me, there's just one or two items I wanted to clarify - those comments are inline. Thanks again, On Thu, Jul 10, 2014 at 10:43:30AM +1000, Dave Chinner wrote: On Wed, Jul 09, 2014 at 03:41:50PM -0700, Mark Fasheh wrote: + +# Enable qgroups now that we have our filesystem prepared. This +# will kick off a scan which we will have to wait for below. +$BTRFS_UTIL_PROG qu en $SCRATCH_MNT +sleep 30 That seems rather arbitrary. The sleeps you are adding add well over a minute to the runtime, and a quota scan of a filesystem with 200 files should be almost instantenous. Yeah I'll bring that back down to 5 seconds? It's 30 from my testing because I was being paranoid and neglected to update it for the rest of the world. Be nice to have the btrfs command wait for it to complete. Not being able to query the status of background work or wait for it is somewhat user unfriendly. If you could poll, then a 1s sleep in a poll loop would be fine. Short of that, then I guess sleep 5 is the best we can do. +_scratch_unmount +_scratch_mount What is the purpose of this? This is kind of 'maximum paranoia' again from my own test script. The idea was to make _absolutely_ certain that all metadata found it's way to disk and won't be optimized in layout any more. There's a decent chance it doesn't do anything but it doesn't seem a huge deal. I wasn't clear though - do you want it removed or can I comment it for clarity? Comment. If someone reads the test in 2 years time they won't have to ask wtf?... +# Ok, delete the snapshot we made previously. Since btrfs drop +# snapshot is a delayed action with no way to force it, we have to +# impose another sleep here. +$BTRFS_UTIL_PROG su de $SCRATCH_MNT/snap1 +sleep 45 That's indicative of a bug, yes? No, that's just how it happens. In fact, if you unmount while a snapshot is being dropped, progress of the drop will be recorded and it will be continued on next mount. However, since we *must* have the drop_snapshot complete for this test I have the large sleep. Unlike the previous sleep I don't think this can be reduced by much :( Right, again the can't wait or poll for status of background work issue comes up. That's the bug in the UI I was refering to. I guess that we'll just have to wait for a long time here. Pretty hacky, though... Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: File server structure suggestion
Andrew Flerchinger posted on Thu, 10 Jul 2014 10:41:02 -0400 as excerpted: Enter btrfs. Unfortunately, it's newer than ZFS and isn't as robust, but it does support online capacity expansion, and the on disk format is expect to be stable. It has data checksums and COW, which are the primary things I'm after. RAID10 seems pretty stable, but RAID56 isn't. So I'm looking for a suggestion. My end goal is RAID6 and expand it a drive at a time as needed. For right now, I can either: 1) Run RAID6, but be aware of its limitations. I can manually remove and add drives in separate steps if needed. Keep the server on a UPS to limit unexpected shutdowns and any corruption there. The whole array can't be scrubbed, but if there is a chechsum problem when reading individual data, will that still be corrected and/or logged? This will be a temporary situation, as over time, more features will be built out, and the existing file system will be better supported. 2) Run RAID10, and convert the file system to RAID6 later once it is stable. Since RAID10 is far more stable and feature complete than RAID56 right now, all features will work okay, I'm just buying more drives/running at lower capacity for the moment. If I have to grow the array, I'd have to buy two drives. In the future, once RAID6 is better supported, I can convert in-place to RAID6. I'd personally consider btrfs raid5/6 to be in practice a slow and lower capacity raid0, at this point, except that you'll get raid5/6 for free when that's fully supported, since it has been doing the writing for that all along, it just couldn't properly restore. IOW, I wouldn't consider it trustworthy at all against loss of a device, which based on your suggestion, isn't appropriate for your usage. That leaves either raid10 or raid1. It's worth noting that btrfs raid1 is at this point paired mirrors only, so no matter how many devices, you still have exactly two mirrors of all (meta)data. N-way-mirroring is planned for after raid5/6 completion. Which could put raid1 in the running for you, and as the simplest redundant raid, it might be easier to convert to raid5/6 later. Then there's raid10, which takes more drives and is faster, but is still limited to two mirrors. But while I haven't actually used raid10 myself, I do /not/ believe it's limited to pair-at-a-time additions. I believe it'll take, for instance five devices, just fine, staggering chunk allocation as necessary to fill all at about the same rate. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfstests/btrfs: add test for quota groups and drop snapshot
Mark Fasheh posted on Thu, 10 Jul 2014 12:00:55 -0700 as excerpted: Given something like that, you'd just replace the calls to sleep with 'btrfs fi synctheworldandwait' and know that on return, the actions you just queued up completed. I'll admit to not really knowing what I'm talking about here, but on first intuition, What about either btrfs filesystem sync calls, or mounting with the synctime (don't have time to look up the specific option ATM) mount option? Normal sync is 30 seconds, but the mount option can be used to make that 5 seconds or whatever. And I don't know whether btrfs filesystem sync is synchronous or not. But that might help reduce it below 30 and 45 seconds, anyway, even if some sleep is still required. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Using serialized BTRFS snapshots as a backup
I'm trying to use serialized BTRFS snapshots as a backup system. The problem is that I don't know how to avoid sending duplicate data and also have the ability to prune old backups. Specifically I've considered the following: #snapshot btrfs subvolume snapshot -r live-volume volume-date #serialize snapshot for transmission to remote machine btrfs send -f backup.date volume-date -p volume-yesterday However this means I have to keep every serialized snapshot forever. I've tried unpacking these incremental snapshots, deleting intermediate volumes, and repacking the latest version. Unfortunately, deleting an intermediate snapshot appears to change the IDs for later snapshots, and future serialized snapshots can't be unpacked. In other words, I have incremental snapshots 1-4, I unpack 1-3, erase 2, and now I can't unpack 4 due to: ERROR: could not find parent subvolume. I've also considered keeping a chain of incremental monthly backups, and basing daily backups on both the monthly and previous daily. This would allow me to delete daily backups in the future, but now I have to send twice as much data to the backup machine. What bothers me is that subvolume 3 (from the example above) really has all the same data before and after I delete subvolume 2. The IDs of the volume change, preventing me from unpacking incremental 4, and that's causing my problems. Are there any better ideas I haven't thought of? Currently running BTRFS 3.12 on kernel 3.13 (Ubuntu 14.04). Thanks, David Player -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] btrfs-progs: Integrate error message output into find_mount_root().
(2014/07/10 17:26), Qu Wenruo wrote: Original Message Subject: Re: [PATCH 2/4] btrfs-progs: Integrate error message output into find_mount_root(). From: Miao Xie mi...@cn.fujitsu.com To: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com, Qu Wenruo quwen...@cn.fujitsu.com, linux-btrfs@vger.kernel.org Date: 2014年07月10日 16:10 Takeuchi-san On Thu, 10 Jul 2014 16:33:23 +0900, Satoru Takeuchi wrote: (2014/07/10 12:05), Qu Wenruo wrote: Before this patch, find_mount_root() and the caller both output error message, which sometimes make the output duplicated and hard to judge what the problem is. This pathh will integrate all the error messages output into find_mount_root() to give more meaning error prompt and remove the unneeded caller error messages. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- cmds-receive.c | 2 -- cmds-send.c | 8 +--- cmds-subvolume.c | 5 + utils.c | 15 --- 4 files changed, 14 insertions(+), 16 deletions(-) diff --git a/cmds-receive.c b/cmds-receive.c index 48380a5..084d97d 100644 --- a/cmds-receive.c +++ b/cmds-receive.c @@ -981,8 +981,6 @@ static int do_receive(struct btrfs_receive *r, const char *tomnt, int r_fd, ret = find_mount_root(dest_dir_full_path, r-root_path); if (ret 0) { ret = -EINVAL; -fprintf(stderr, ERROR: failed to determine mount point -for %s\n, dest_dir_full_path); goto out; } r-mnt_fd = open(r-root_path, O_RDONLY | O_NOATIME); diff --git a/cmds-send.c b/cmds-send.c index 9a73b32..091f32b 100644 --- a/cmds-send.c +++ b/cmds-send.c @@ -357,8 +357,6 @@ static int init_root_path(struct btrfs_send *s, const char *subvol) ret = find_mount_root(subvol, s-root_path); if (ret 0) { ret = -EINVAL; -fprintf(stderr, ERROR: failed to determine mount point -for %s\n, subvol); goto out; } @@ -622,12 +620,8 @@ int cmd_send(int argc, char **argv) } ret = find_mount_root(subvol, mount_root); -if (ret 0) { -fprintf(stderr, ERROR: find_mount_root failed on %s: -%s\n, subvol, -strerror(-ret)); +if (ret 0) goto out; -} if (strcmp(send.root_path, mount_root) != 0) { ret = -EINVAL; fprintf(stderr, ERROR: all subvols must be from the diff --git a/cmds-subvolume.c b/cmds-subvolume.c index 639fb10..b252eab 100644 --- a/cmds-subvolume.c +++ b/cmds-subvolume.c @@ -981,11 +981,8 @@ static int cmd_subvol_show(int argc, char **argv) } ret = find_mount_root(fullpath, mnt); -if (ret 0) { -fprintf(stderr, ERROR: find_mount_root failed on %s: -%s\n, fullpath, strerror(-ret)); +if (ret 0) goto out; -} ret = 1; svpath = get_subvol_name(mnt, fullpath); diff --git a/utils.c b/utils.c index 507ec6c..07173ee 100644 --- a/utils.c +++ b/utils.c @@ -2417,13 +2417,19 @@ int find_mount_root(const char *path, char **mount_root) char *longest_match = NULL; fd = open(path, O_RDONLY | O_NOATIME); -if (fd 0) +if (fd 0) { +fprintf(stderr, ERROR: Failed to open %s: %s\n, +path, strerror(errno)); It drops part of original messages. It doesn't show this error is from find_mount_root(). I consider the original meaning keep as is. How do you think? I think it is strange for the common users to show the name of a internal function. Maybe we should introduce two kinds of the message, one is for the common users, the other is for the developers to debug. Thanks Miao I agree with Miao's idea. It's true that some developers needs to get info from the output, but IMO the error messages are often used to indicate what *users* do wrong, since most problem is caused by wrong parameter given by users. For example, I always forget to run 'btrfs fi df /mnt' and the 'Operation not permiited' message makes me realize the permission problem. And the function name or other messages are less important than that. On the other hand, if developers encounter problems, they will gdb the program or grep the source to find out the problem. So function name in error message seems not so demanding for me. OK, I got it. Reviewed-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com It would also be a greate idea for adding new frame work to show debug message, but I'd prefer to make the frame some times later(Maybe when btrfs-progs become more comlicated than current?) It's nice. I consider the messages of btrfs-progs are a bit messy. Satoru Thanks, Qu Thanks, Satoru return -errno; +} close(fd); mnttab = setmntent(/proc/self/mounts, r); -if (!mnttab) +if (!mnttab) { +fprintf(stderr, ERROR: Failed to setmntent: %s\n, +strerror(errno)); return -errno; +}
Btrfs transaction checksum corruption losing root of the tree bizarre UUID change.
Hi all ! So it been some time with btrfs, and so far I was very pleased, but since I've upgraded to ubuntu from 13.10 to 14.04 problems started to occur (YES I know this might be unrelated). So in the past I've had problems with btrfs which turned out to be a problem caused by static from printer generating some corruption in ram causing checksum failures on the file system - so I'm not going to assume that there is something wrong with btrfs from the start. Anyway: On my server I'm running 6 x 2TB disk in raid 10 for general storage and 2 x ~0.5 TB raid 1 for system. Might be unrelated, but after upgrading to 14.04 I've started using Own Cloud which uses Apache MySql for backing store - all data stored on storage array, mysql was on system array. All started with csum errors showing up in mysql data files and in some transactions !!!. Generally system imidiatelly was switching to all btrfs read only mode due to being forced by kernel (don't have dmesg / syslog now). Removed offending files, problem seemed to go away and started from scratch. After 5 days problem reapered and now was located around same mysql files and in files managed by apache as cloud. At this point since these files are rather dear to me I've decided to pull all stops and try to rescue as much as I can. As a excercise in btrfs managment I've run btrfsck --repair - did not help. Repeated with --init-csum-tree - turned out that this left me with blank system array. Nice ! could use some warning here. I've moved all drives and move those to my main rig which got a nice 16GB of ecc ram, so errors of ram, cpu, controller should be kept theoretically eliminated. I've used system array drives and spare drive to extract all dear to me files to newly created array (1tb + 500GB + 640GB). Runned a scrub on it and everything seemed OK. At this point I've deleted dear to me files from storage array and ran a scrub. Scrub now showed even more csum errors in transactions and one large file that was not touched FOR VERY LONG TIME (size ~1GB). Deleted file. Ran scrub - no errors. Copied dear to me files back to storage array. Ran scrub - no issues. Deleted files from my backup array and decided to call a day. Next day I've decided to run a scrub once more just to be sure this time it discovered a myriad of errors in files and transactions. Since I've had no time to continue decided to postpone on next day - next day I've started my rig and noticed that both backup array and storage array does not mount anymore. I was attempting to rescue situation without any luck. Power cycled PC and on next startup both arrays failed to mount, when I tried to mount backup array mount told me that this specific uuid DOES NOT EXIST !?!?! my fstab uuid: fcf23e83-f165-4af0-8d1c-cd6f8d2788f4 new uuid: 771a4ed0-5859-4e10-b916-07aec4b1a60b tried to mount by /dev/sdb1 and it did mount. Tried by new uuid and it did mount as well. Scrub passes with flying colours on backup array while storage array still fails to mount with: root@ubuntu-pc:~# mount /dev/sdd1 /arrays/@storage/ mount: wrong fs type, bad option, bad superblock on /dev/sdd1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so for any device in the array. Honestly this is a question to more senior guys - what should I do now ? Chris Mason - have you got any updates to your old friend stress.sh ? If not I can try using previous version that you provided to stress test my system - but I this is a second system that exposes this erratic behaviour. Anyone - what can I do to rescue my bellowed files (no sarcasm with zfs / ext4 / tapes / DVDs) ps. needles to say: SMART - no sata CRC errors, no relocated sectors, no errors what so ever (as much as I can see). -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3] btrfs: remove unnecessary error check
Hi Eric, (2014/07/10 22:27), Eric Sandeen wrote: On 7/10/14, 1:44 AM, Satoru Takeuchi wrote: (2014/07/10 12:26), Eric Sandeen wrote: On Jul 9, 2014, at 10:20 PM, Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com wrote: From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com If (!IS_ERR(trans) || PTR_ERR(trans) != -ENOSPC)) is false, obviously trans is -ENOSPC. So we can safely remove the redundant (PTR_ERR(trans) == -ENOSPC) check. True, but now a comment like: /* Handle ENOSPC */ might still be nice... Eric, thank you for your comment. I fixed my patch. How about is it? One other thing I missed the first time, I'm sorry, notes below: === From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com If (!IS_ERR(trans) || PTR_ERR(trans) != -ENOSPC)) is false, obviously trans is -ENOSPC. So we can safely remove the redundant (PTR_ERR(trans) == -ENOSPC) check. Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com --- fs/btrfs/inode.c | 29 +++-- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 3668048..115aac3 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3803,22 +3803,23 @@ static struct btrfs_trans_handle *__unlink_start_trans(struct inode *dir) if (!IS_ERR(trans) || PTR_ERR(trans) != -ENOSPC) return trans; -if (PTR_ERR(trans) == -ENOSPC) { -u64 num_bytes = btrfs_calc_trans_metadata_size(root, 5); +/* Handle ENOSPC */ -trans = btrfs_start_transaction(root, 0); -if (IS_ERR(trans)) -return trans; -ret = btrfs_cond_migrate_bytes(root-fs_info, - root-fs_info-trans_block_rsv, - num_bytes, 5); -if (ret) { -btrfs_end_transaction(trans, root); -return ERR_PTR(ret); -} -trans-block_rsv = root-fs_info-trans_block_rsv; -trans-bytes_reserved = num_bytes; +u64 num_bytes = btrfs_calc_trans_metadata_size(root, 5); This variable should be declared at the beginning of the function, not in the middle, because it's no longer in a separate code block. OK, moved. Also, somehow by the time the patch got here, tabs turned into 4 spaces, so this one wouldn't apply for me. I did't realize that. Thank you. Sorry for missing the variable declaration problem the first time! No problem, more review is welcome. THank you very much :-) This is the v3 patch. === From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com If (!IS_ERR(trans) || PTR_ERR(trans) != -ENOSPC)) is false, obviously trans is -ENOSPC. So we can safely remove the redundant (PTR_ERR(trans) == -ENOSPC) check. Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com --- fs/btrfs/inode.c | 28 ++-- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 3668048..e7ac779 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3790,6 +3790,7 @@ static struct btrfs_trans_handle *__unlink_start_trans(struct inode *dir) { struct btrfs_trans_handle *trans; struct btrfs_root *root = BTRFS_I(dir)-root; + u64 num_bytes = btrfs_calc_trans_metadata_size(root, 5); int ret; /* @@ -3803,22 +3804,21 @@ static struct btrfs_trans_handle *__unlink_start_trans(struct inode *dir) if (!IS_ERR(trans) || PTR_ERR(trans) != -ENOSPC) return trans; - if (PTR_ERR(trans) == -ENOSPC) { - u64 num_bytes = btrfs_calc_trans_metadata_size(root, 5); + /* Handle ENOSPC */ - trans = btrfs_start_transaction(root, 0); - if (IS_ERR(trans)) - return trans; - ret = btrfs_cond_migrate_bytes(root-fs_info, - root-fs_info-trans_block_rsv, - num_bytes, 5); - if (ret) { - btrfs_end_transaction(trans, root); - return ERR_PTR(ret); - } - trans-block_rsv = root-fs_info-trans_block_rsv; - trans-bytes_reserved = num_bytes; + trans = btrfs_start_transaction(root, 0); + if (IS_ERR(trans)) + return trans; + ret = btrfs_cond_migrate_bytes(root-fs_info, + root-fs_info-trans_block_rsv, + num_bytes, 5); + if (ret) { + btrfs_end_transaction(trans, root); + return ERR_PTR(ret); } + trans-block_rsv = root-fs_info-trans_block_rsv; + trans-bytes_reserved = num_bytes; + return trans; } -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs transaction checksum corruption losing root of the tree bizarre UUID change.
On 07/10/2014 07:32 PM, Tomasz Kusmierz wrote: Hi all ! So it been some time with btrfs, and so far I was very pleased, but since I've upgraded to ubuntu from 13.10 to 14.04 problems started to occur (YES I know this might be unrelated). So in the past I've had problems with btrfs which turned out to be a problem caused by static from printer generating some corruption in ram causing checksum failures on the file system - so I'm not going to assume that there is something wrong with btrfs from the start. Anyway: On my server I'm running 6 x 2TB disk in raid 10 for general storage and 2 x ~0.5 TB raid 1 for system. Might be unrelated, but after upgrading to 14.04 I've started using Own Cloud which uses Apache MySql for backing store - all data stored on storage array, mysql was on system array. All started with csum errors showing up in mysql data files and in some transactions !!!. Generally system imidiatelly was switching to all btrfs read only mode due to being forced by kernel (don't have dmesg / syslog now). Removed offending files, problem seemed to go away and started from scratch. After 5 days problem reapered and now was located around same mysql files and in files managed by apache as cloud. At this point since these files are rather dear to me I've decided to pull all stops and try to rescue as much as I can. As a excercise in btrfs managment I've run btrfsck --repair - did not help. Repeated with --init-csum-tree - turned out that this left me with blank system array. Nice ! could use some warning here. I know that this will eventually be pointed out by somebody, so I'm going to save them the trouble and mention that it does say on both the wiki and in the manpages that btrfsck should be a last-resort (ie, after you have made sure you have backups of anything on the FS). I've moved all drives and move those to my main rig which got a nice 16GB of ecc ram, so errors of ram, cpu, controller should be kept theoretically eliminated. I've used system array drives and spare drive to extract all dear to me files to newly created array (1tb + 500GB + 640GB). Runned a scrub on it and everything seemed OK. At this point I've deleted dear to me files from storage array and ran a scrub. Scrub now showed even more csum errors in transactions and one large file that was not touched FOR VERY LONG TIME (size ~1GB). Deleted file. Ran scrub - no errors. Copied dear to me files back to storage array. Ran scrub - no issues. Deleted files from my backup array and decided to call a day. Next day I've decided to run a scrub once more just to be sure this time it discovered a myriad of errors in files and transactions. Since I've had no time to continue decided to postpone on next day - next day I've started my rig and noticed that both backup array and storage array does not mount anymore. I was attempting to rescue situation without any luck. Power cycled PC and on next startup both arrays failed to mount, when I tried to mount backup array mount told me that this specific uuid DOES NOT EXIST !?!?! my fstab uuid: fcf23e83-f165-4af0-8d1c-cd6f8d2788f4 new uuid: 771a4ed0-5859-4e10-b916-07aec4b1a60b tried to mount by /dev/sdb1 and it did mount. Tried by new uuid and it did mount as well. Scrub passes with flying colours on backup array while storage array still fails to mount with: root@ubuntu-pc:~# mount /dev/sdd1 /arrays/@storage/ mount: wrong fs type, bad option, bad superblock on /dev/sdd1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so for any device in the array. Honestly this is a question to more senior guys - what should I do now ? Chris Mason - have you got any updates to your old friend stress.sh ? If not I can try using previous version that you provided to stress test my system - but I this is a second system that exposes this erratic behaviour. Anyone - what can I do to rescue my bellowed files (no sarcasm with zfs / ext4 / tapes / DVDs) ps. needles to say: SMART - no sata CRC errors, no relocated sectors, no errors what so ever (as much as I can see). First thing that I would do is some very heavy testing with tools like iozone and fio. I would use the verify mode from iozone to further check data integrity. My guess based on what you have said is that it is probably issues with either the storage controller (I've had issues with almost every brand of SATA controller other than Intel, AMD, Via, and Nvidia, and it almost always manifested as data corruption under heavy load), or something in the disk's firmware. I would still suggest double-checking your RAM with Memtest, and check the cables on the drives. The one other thing that I can think of is potential voltage sags from the PSU (either because the PSU is overloaded at times, or because of really noisy/poorly-conditioned line power). Of course, I may be
Re: [Question] disk_bytenr with multiple devices
Original Message Subject: [Question] disk_bytenr with multiple devices From: Zhe Zhang zhe.zhang.resea...@gmail.com To: linux-btrfs@vger.kernel.org Date: 2014年07月11日 02:21 When a btrfs has multiple devices (e.g. /dev/sdb, /dev/sdc), how should I interpret disk_bytenr in btrfs_file_extent_item? Does it depend on the striping config? Say I used raid0, then disk_bytenr 0~64K will be on /dev/sdb, and 64K~128K on /dev/sdc? Thanks, Zhe -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html https://btrfs.wiki.kernel.org/index.php/Data_Structures#btrfs_file_extent_item As you can see in the btrfs wiki, disk_bytenr is *logical* address in btrfs linear space. Not really on disk address. If you really want the address on device, you need to find the chunk containing the address, then the stripe in chunk item will show the raid profile and device address on each device. Thanks, Qu -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html