Re: [PATCH] btrfs: Mem leak in btrfs_get_acl()
On Thu, 6 Jan 2011 22:45:21 +0100 (CET), Jesper Juhl j...@chaosbits.net wrote: It seems to me that we leak the memory allocated to 'value' in btrfs_get_acl() if the call to posix_acl_from_xattr() fails. Here's a patch that attempts to correct that problem. Signed-off-by: Jesper Juhl j...@chaosbits.net I posted a similar patch long time back. But never got picked up http://article.gmane.org/gmane.comp.file-systems.btrfs/6164 Message-id:1279547924-25141-1-git-send-email-aneesh.ku...@linux.vnet.ibm.com --- acl.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) compile tested only. diff --git a/fs/btrfs/acl.c b/fs/btrfs/acl.c index d16..6d1410e 100644 --- a/fs/btrfs/acl.c +++ b/fs/btrfs/acl.c @@ -60,8 +60,10 @@ static struct posix_acl *btrfs_get_acl(struct inode *inode, int type) size = __btrfs_getxattr(inode, name, value, size); if (size 0) { acl = posix_acl_from_xattr(value, size); - if (IS_ERR(acl)) + if (IS_ERR(acl)) { + kfree(value); return acl; + } set_cached_acl(inode, type, acl); } kfree(value); -aneesh -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Bug report: parent transid failed after heavy load
Dear all, During a move of some 60GB of data from an ext4 partition to a btrfs partition, both on the same disk, the following happened: - my window manager froze; - the move suspended, i.e., no more data was written to the destination or deleted from the source; - part of top output: (sorry for possible wrapping; sending this from webmail) PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 13610 root 20 0 000 D 26 0.0 299:45.87 btrfs-cache-279 1566 root 20 0 000 S 22 0.0 218:30.86 btrfs-endio-met (there were also a firefox instance and an npviewer.bin process still consuming CPU time, but I guess this was some flash movie happily playing along during the freeze); - those two btrfs processes could not be terminated, not even by kill -9; - lsof 1566 output (similar for the other process): COMMANDPID USER FD TYPE DEVICE SIZE/OFF NODE NAME btrfs-end 1566 root cwd DIR8,6 40962 / btrfs-end 1566 root rtd DIR8,6 40962 / btrfs-end 1566 root txt unknown /proc/1566/exe After a reboot, I am able to mount the btrfs filesystem, and read data from it, but as soon as I try any write operation (even a simple touch), that command hangs, and there are two btrfs processes hanging around, just as above; dmesg gives lots of parent transid failed messages. My kernel is 2.6.36 (with gentoo patches). So, the questions: 1) Is this a known problem? If so, is it fixed in a newer version? In the archive of this list, I read about others with parent transid failed errors, and a recovery operation (suggested by Chris Mason), using btrfs-select-super http://www.spinics.net/lists/linux-btrfs/msg07572.html. 2) Should I try this procedure to fix my filesystem? Is there any debug information I should collect first? (I can recreate the two spinning processes by rebooting and writing to the filesystem.) I am saddened by this failure, as this data move was actually part of an operation to switch over to btrfs completely, after using it without problems for quite a while. Thanks for any help. Keep up the good work. Regards, Arie Peterson -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Atomic file data replace API
On 6 January 2011 20:01, Olaf van der Spek olafvds...@gmail.com wrote: Hi, Does btrfs support atomic file data replaces? Hi Olaf, Yes btrfs does support atomic replace, since kernel 2.6.30 circa June 2009. [1] Special handling was added to ext3, ext4, btrfs (and probably other Linux FSs) for your replace-via-truncate and the alternative replace-via-rename application patterns. Try reading Delayed allocation and the zero-length file problem article and comments by Ted Ts'o for further discussion. [2] Mike -- [1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5a3f23d515a2ebf0c750db80579ca57b28cbce6d [2] http://thunk.org/tytso/blog/2009/03/12/delayed-allocation-and-the-zero-length-file-problem/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Atomic file data replace API
On Fri, Jan 7, 2011 at 2:55 PM, Mike Fleetwood mike.fleetw...@googlemail.com wrote: On 6 January 2011 20:01, Olaf van der Spek olafvds...@gmail.com wrote: Hi, Does btrfs support atomic file data replaces? Hi Olaf, Yes btrfs does support atomic replace, since kernel 2.6.30 circa June 2009. [1] Special handling was added to ext3, ext4, btrfs (and probably other Linux FSs) for your replace-via-truncate and the alternative replace-via-rename application patterns. Try reading Delayed allocation and the zero-length file problem article and comments by Ted Ts'o for further discussion. [2] According to Ted, via-truncate and via-rename are unsafe. Only fsync, rename is safe. Disadvantage of rename is resetting file owner (if non-root), having issues with meta-data and other stuff. My proposal was for an open flag, O_ATOMIC, to be introduced to tell the FS the whole file update should be done atomically. Ted says this is too hard in ext4, so I was wondering if this would be possible in btrfs. Olaf -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Atomic file data replace API
On Fri, Jan 7, 2011 at 3:01 PM, Olaf van der Spek olafvds...@gmail.com wrote: According to Ted, via-truncate and via-rename are unsafe. Only fsync, rename is safe. Disadvantage of rename is resetting file owner (if non-root), having issues with meta-data and other stuff. My proposal was for an open flag, O_ATOMIC, to be introduced to tell the FS the whole file update should be done atomically. Ted says this is too hard in ext4, so I was wondering if this would be possible in btrfs. http://thunk.org/tytso/blog/2009/03/12/delayed-allocation-and-the-zero-length-file-problem/#comment-2082 http://thunk.org/tytso/blog/2009/03/12/delayed-allocation-and-the-zero-length-file-problem/#comment-2089 http://thunk.org/tytso/blog/2009/03/12/delayed-allocation-and-the-zero-length-file-problem/#comment-2090 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/5] add metadata_incore ioctl in vfs
On Thursday 06 January 2011, Shaohua Li wrote: Subject: add metadata_incore ioctl in vfs Add an ioctl to dump filesystem's metadata in memory in vfs. Userspace collects such info and uses it to do metadata readahead. Filesystem can hook to super_operations.metadata_incore to get metadata in specific approach. Next patch will give an example how to implement .metadata_incore in btrfs. Signed-off-by: Shaohua Li shaohua...@intel.com Looks great! Reviewed-by: Arnd Bergmann a...@arndb.de -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/5]add metadata_readahead ioctl in vfs
On Thursday 06 January 2011, Shaohua Li wrote: Subject: add metadata_readahead ioctl in vfs Add metadata readahead ioctl in vfs. Filesystem can hook to super_operations.metadata_readahead to handle filesystem specific task. Next patch will give an example how btrfs implements it. Signed-off-by: Shaohua Li shaohua...@intel.com Reviewed-by: Arnd Bergmann a...@arndb.de -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Atomic file data replace API
Excerpts from Olaf van der Spek's message of 2011-01-06 15:01:15 -0500: Hi, Does btrfs support atomic file data replaces? Basically, the atomic variant of this: // old stage open(O_TRUNC) write() // 0+ times close() // new state Yes and no. We have a best effort mechanism where we try to guess that since you've done this truncate and the write that you want the writes to show up quickly. But its a guess. The problem is the write() // 0+ times. The kernel has no idea what new result you want the file to contain because the application isn't telling us. What btrfs can do (but we haven't yet implemented) is make sure that the results of a single write file are on disk atomically, even if they are replacing existing bytes in the file. Because we cow and because we don't update metadata pointers until the IO is complete, we can wait until all the IO for a given write call is on disk before we update any of the metadata. This isn't hard, it's on my TODO list. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Atomic file data replace API
On Fri, Jan 7, 2011 at 3:58 PM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Olaf van der Spek's message of 2011-01-06 15:01:15 -0500: Hi, Does btrfs support atomic file data replaces? Basically, the atomic variant of this: // old stage open(O_TRUNC) write() // 0+ times close() // new state Yes and no. We have a best effort mechanism where we try to guess that since you've done this truncate and the write that you want the writes to show up quickly. But its a guess. The problem is the write() // 0+ times. The kernel has no idea what new result you want the file to contain because the application isn't telling us. Isn't it safe for the kernel to wait until the first write or close before writing anything to disk? What btrfs can do (but we haven't yet implemented) is make sure that the results of a single write file are on disk atomically, even if they are replacing existing bytes in the file. Because we cow and because we don't update metadata pointers until the IO is complete, we can wait until all the IO for a given write call is on disk before we update any of the metadata. This isn't hard, it's on my TODO list. What about a new flag: O_ATOMIC that'd take the guesswork out of the kernel? Olaf -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Atomic file data replace API
Excerpts from Olaf van der Spek's message of 2011-01-07 10:01:59 -0500: On Fri, Jan 7, 2011 at 3:58 PM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Olaf van der Spek's message of 2011-01-06 15:01:15 -0500: Hi, Does btrfs support atomic file data replaces? Basically, the atomic variant of this: // old stage open(O_TRUNC) write() // 0+ times close() // new state Yes and no. We have a best effort mechanism where we try to guess that since you've done this truncate and the write that you want the writes to show up quickly. But its a guess. The problem is the write() // 0+ times. The kernel has no idea what new result you want the file to contain because the application isn't telling us. Isn't it safe for the kernel to wait until the first write or close before writing anything to disk? I'm afraid not. Picture an application that opens a thousand files and writes 1MB to each of them, and then didn't close any. If we waited until close, you'd have 1GB of memory pinned or staged somehow. What btrfs can do (but we haven't yet implemented) is make sure that the results of a single write file are on disk atomically, even if they are replacing existing bytes in the file. Because we cow and because we don't update metadata pointers until the IO is complete, we can wait until all the IO for a given write call is on disk before we update any of the metadata. This isn't hard, it's on my TODO list. What about a new flag: O_ATOMIC that'd take the guesswork out of the kernel? We can't guess beyond a single write call. Otherwise we get into the problem above where an application can force the kernel to wait forever. I'm not against O_ATOMIC to enable the new btrfs functionality, but it will still be limited to one write. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Atomic file data replace API
On Fri, Jan 7, 2011 at 4:05 PM, Chris Mason chris.ma...@oracle.com wrote: The problem is the write() // 0+ times. The kernel has no idea what new result you want the file to contain because the application isn't telling us. Isn't it safe for the kernel to wait until the first write or close before writing anything to disk? I'm afraid not. Picture an application that opens a thousand files and writes 1MB to each of them, and then didn't close any. If we waited until close, you'd have 1GB of memory pinned or staged somehow. That's not what I asked. ;) I asked to wait until the first write (or close). That way, you don't get unintentional empty files. One step further, you don't have to keep the data in memory, you're free to write them to disk. You just wouldn't update the meta-data (yet). This isn't hard, it's on my TODO list. What about a new flag: O_ATOMIC that'd take the guesswork out of the kernel? We can't guess beyond a single write call. Otherwise we get into the problem above where an application can force the kernel to wait forever. I'm not against O_ATOMIC to enable the new btrfs functionality, but it will still be limited to one write. -chris -- Olaf -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Atomic file data replace API
Excerpts from Olaf van der Spek's message of 2011-01-07 10:08:24 -0500: On Fri, Jan 7, 2011 at 4:05 PM, Chris Mason chris.ma...@oracle.com wrote: The problem is the write() // 0+ times. The kernel has no idea what new result you want the file to contain because the application isn't telling us. Isn't it safe for the kernel to wait until the first write or close before writing anything to disk? I'm afraid not. Picture an application that opens a thousand files and writes 1MB to each of them, and then didn't close any. If we waited until close, you'd have 1GB of memory pinned or staged somehow. That's not what I asked. ;) I asked to wait until the first write (or close). That way, you don't get unintentional empty files. One step further, you don't have to keep the data in memory, you're free to write them to disk. You just wouldn't update the meta-data (yet). Sorry ;) Picture an application that truncates 1024 files without closing any of them. Basically any operation that includes the kernel waiting for applications because they promise to do something soon is a denial of service attack, or a really easy way to run out of memory on the box. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Atomic file data replace API
On Fri, Jan 7, 2011 at 4:13 PM, Chris Mason chris.ma...@oracle.com wrote: That's not what I asked. ;) I asked to wait until the first write (or close). That way, you don't get unintentional empty files. One step further, you don't have to keep the data in memory, you're free to write them to disk. You just wouldn't update the meta-data (yet). Sorry ;) Picture an application that truncates 1024 files without closing any of them. Basically any operation that includes the kernel waiting for applications because they promise to do something soon is a denial of service attack, or a really easy way to run out of memory on the box. I'm not sure why you would run out of memory in that case. O_ATOMIC would be the solution for the rename workaround: write temp file, rename With advantages like a way simpler API, no issues with resetting meta-data, no issues with temp file and maybe better performance. Olaf -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Atomic file data replace API
Excerpts from Olaf van der Spek's message of 2011-01-07 10:17:31 -0500: On Fri, Jan 7, 2011 at 4:13 PM, Chris Mason chris.ma...@oracle.com wrote: That's not what I asked. ;) I asked to wait until the first write (or close). That way, you don't get unintentional empty files. One step further, you don't have to keep the data in memory, you're free to write them to disk. You just wouldn't update the meta-data (yet). Sorry ;) Picture an application that truncates 1024 files without closing any of them. Basically any operation that includes the kernel waiting for applications because they promise to do something soon is a denial of service attack, or a really easy way to run out of memory on the box. I'm not sure why you would run out of memory in that case. Well, lets make sure I've got a good handle on the proposed interface: 1) fd = open(some_file, O_ATOMIC) 2) truncate(fd, 0) 3) write(fd, new data) The semantics are that we promise not to let the truncate hit the disk until the application does the write. We have a few choices on how we do this: 1) Leave the disk untouched, but keep something in memory that says this inode is really truncated 2) Record on disk that we've done our atomic truncate but it is still pending. We'd need some way to remove or invalidate this record after a crash. 3) Go ahead and do the operation but don't allow the transaction to commit until the write is done. option #1: keep something in memory. Well, any time we have a requirement to pin something in memory until userland decides to do a write, we risk oom. option #2: disk format change. Actually somewhat complex because if we haven't crashed, we need to be able to read the inode in again without invalidating the record but if we do crash, we have to invalidate the record. Not impossible, but not trivial. option #3: Pin the whole transaction. Depending on the FS this may be impossible. Certain operations require us to commit the transaction to reclaim space, and we cannot allow userland to put that on hold without deadlocking. What most people don't realize about the crash safe filesystems is they don't have fine grained transactions. There is one single transaction for all the operations done. This is mostly because it is less complex and much faster, but it also makes any 'pin the whole transaction' type system unusable. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synching a Backup Server
On Friday, January 07, 2011 00:07:37 Carl Cook wrote: On Thu 06 January 2011 14:26:30 Carl Cook wrote: According To Doyle... Er, Hoyle... I am trying to create a multi-device BTRFS system using two identical drives. I want them to be raid 0 for no redunancy, and a total of 4TB. But in the wiki it says nothing about using fdisk to set up the drive first. It just basically says for me to: mkfs.btrfs -m raid0 /dev/sdc /dev/sdd I'd suggest at least mkfs.btrfs -m raid1 -d raid0 /dev/sdc /dev/sdd if you really want raid0 Seems to me that for mdadm I had to set each drive as a raid member, assemble the array, then format. Is this not the case with BTRFS? Also in the wiki it says After a reboot or reloading the btrfs module, you'll need to use btrfs device scan to discover all multi-device filesystems on the machine. Is this not done automatically? Do I have to set up some script to do this? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Hubert Kario QBS - Quality Business Software 02-656 Warszawa, ul. Ksawerów 30/85 tel. +48 (22) 646-61-51, 646-74-24 www.qbs.com.pl -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Atomic file data replace API
On Fri, Jan 7, 2011 at 5:12 PM, Chris Mason chris.ma...@oracle.com wrote: I'm not sure why you would run out of memory in that case. Well, lets make sure I've got a good handle on the proposed interface: 1) fd = open(some_file, O_ATOMIC) No, O_TRUNC should be used in open. Maybe it works with a separate truncate too. 2) truncate(fd, 0) 3) write(fd, new data) The semantics are that we promise not to let the truncate hit the disk until the application does the write. We have a few choices on how we do this: 1) Leave the disk untouched, but keep something in memory that says this inode is really truncated 2) Record on disk that we've done our atomic truncate but it is still pending. We'd need some way to remove or invalidate this record after a crash. 3) Go ahead and do the operation but don't allow the transaction to commit until the write is done. option #1: keep something in memory. Well, any time we have a requirement to pin something in memory until userland decides to do a write, we risk oom. Since the file is open, you have to keep something in memory anyway, right? Adding a bit (or bool) does not make a difference IMO. Isn't this comparable to opening a temp file? option #2: disk format change. Actually somewhat complex because if we haven't crashed, we need to be able to read the inode in again without invalidating the record but if we do crash, we have to invalidate the record. Not impossible, but not trivial. option #3: Pin the whole transaction. Depending on the FS this may be impossible. Certain operations require us to commit the transaction to reclaim space, and we cannot allow userland to put that on hold without deadlocking. #1 is the only one that makes sense. What most people don't realize about the crash safe filesystems is they don't have fine grained transactions. There is one single transaction for all the operations done. This is mostly because it is less complex and much faster, but it also makes any 'pin the whole transaction' type system unusable. AFAIK the cost is mostly more complex code / runtime. The cost is not disk performance. -- Olaf -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synching a Backup Server
On Thursday, January 06, 2011 22:52:25 Freddie Cash wrote: On Thu, Jan 6, 2011 at 1:42 PM, Carl Cook cac...@quantum-sci.com wrote: On Thu 06 January 2011 11:16:49 Freddie Cash wrote: Also with this system, I'm concerned that if there is corruption on the HTPC, it could be propagated to the backup server. Is there some way to address this? Longer intervals to sync, so I have a chance to discover? Using snapshots on the backup server allows you to go back in time to recover files that may have been accidentally deleted, or to recover files that have been corrupted. How? I can see that rsync will not transfer the files that have not changed, but I assume it transfers the changed ones. How can you go back in time? Is there like a snapshot file that records the state of all files there? I don't know the specifics of how it works in btrfs, but it should be similar to how ZFS does it. The gist of it is: Each snapshot gives you a point-in-time view of the entire filesystem. Each snapshot can be mounted (ZFS is read-only; btrfs is read-only or read-write). So, you mount the snapshot for 2010-12-15 onto /mnt, then cd to the directory you want (/mnt/htpc/home/fcash/videos/) and copy the file out that you want to restore (cp coolvid.avi ~/). With ZFS, things are nice and simple: - each filesystem has a .zfs/snapshot directory - in there are sub-directories, each named after the snapshot name - cd into the snapshot name, the OS auto-mounts the snapshot, and off you go Btrfs should be similar? Don't know the specifics. How it works internally, is some of the magic and the beauty of Copy-on-Write filesystems. :) I usually create subvolumes in btrfs root volume: /mnt/btrfs/ |- server-a |- server-b \- server-c then create snapshots of these directories: /mnt/btrfs/ |- server-a |- server-b |- server-c |- snapshots-server-a |- @GMT-2010.12.21-16.48.09 \- @GMT-2010.12.22-16.45.14 |- snapshots-server-b \- snapshots-server-c This way I can use the shadow_copy module for samba to publish the snapshots to windows clients. -- Hubert Kario QBS - Quality Business Software 02-656 Warszawa, ul. Ksawerów 30/85 tel. +48 (22) 646-61-51, 646-74-24 www.qbs.com.pl -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Atomic file data replace API
On Friday, January 07, 2011 17:12:11 Chris Mason wrote: Excerpts from Olaf van der Spek's message of 2011-01-07 10:17:31 -0500: On Fri, Jan 7, 2011 at 4:13 PM, Chris Mason chris.ma...@oracle.com wrote: That's not what I asked. ;) I asked to wait until the first write (or close). That way, you don't get unintentional empty files. One step further, you don't have to keep the data in memory, you're free to write them to disk. You just wouldn't update the meta-data (yet). Sorry ;) Picture an application that truncates 1024 files without closing any of them. Basically any operation that includes the kernel waiting for applications because they promise to do something soon is a denial of service attack, or a really easy way to run out of memory on the box. I'm not sure why you would run out of memory in that case. Well, lets make sure I've got a good handle on the proposed interface: 1) fd = open(some_file, O_ATOMIC) 2) truncate(fd, 0) 3) write(fd, new data) The semantics are that we promise not to let the truncate hit the disk until the application does the write. We have a few choices on how we do this: 1) Leave the disk untouched, but keep something in memory that says this inode is really truncated 2) Record on disk that we've done our atomic truncate but it is still pending. We'd need some way to remove or invalidate this record after a crash. 3) Go ahead and do the operation but don't allow the transaction to commit until the write is done. option #1: keep something in memory. Well, any time we have a requirement to pin something in memory until userland decides to do a write, we risk oom. Userland has already a file descriptor allocated (which can fail anyway because of OOM), I see no problem in increasing the size of kernel memory usage by 4 bytes (if not less) just to note that the application wants to see the file as truncated (1 bit) and the next write has to be atomic (2nd bit?). -- Hubert Kario QBS - Quality Business Software 02-656 Warszawa, ul. Ksawerów 30/85 tel. +48 (22) 646-61-51, 646-74-24 www.qbs.com.pl -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Atomic file data replace API
Are you suggesting to do: 1)fopen with O_TRUNC, O_ATOMIC: returns fd to a temporary file 2)application writes to that fd, with one or more system calls, in a short time or in long time, at his will. 3)at fclose (or even at fsync ) atomically swap data pointer of real file with temp file, then delete temp.In a transparent mode to userland. (something similar to e4defrag). Is this sum up correct? Massimo Maggi Il 07/01/2011 16:17, Olaf van der Spek ha scritto: On Fri, Jan 7, 2011 at 4:13 PM, Chris Mason chris.ma...@oracle.com wrote: That's not what I asked. ;) I asked to wait until the first write (or close). That way, you don't get unintentional empty files. One step further, you don't have to keep the data in memory, you're free to write them to disk. You just wouldn't update the meta-data (yet). Sorry ;) Picture an application that truncates 1024 files without closing any of them. Basically any operation that includes the kernel waiting for applications because they promise to do something soon is a denial of service attack, or a really easy way to run out of memory on the box. I'm not sure why you would run out of memory in that case. O_ATOMIC would be the solution for the rename workaround: write temp file, rename With advantages like a way simpler API, no issues with resetting meta-data, no issues with temp file and maybe better performance. Olaf -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Atomic file data replace API
On Fri, Jan 7, 2011 at 5:32 PM, Massimo Maggi mass...@.it wrote: Are you suggesting to do: 1)fopen with O_TRUNC, O_ATOMIC: returns fd to a temporary file 2)application writes to that fd, with one or more system calls, in a short time or in long time, at his will. 3)at fclose (or even at fsync ) atomically swap data pointer of real file with temp file, then delete temp.In a transparent mode to userland. (something similar to e4defrag). Is this sum up correct? Almost. Swap should probably not be done at fsync time. Other open references (for example running executables) should be swapped too. The new-file case has to be handled too. Olaf -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption?
On Thu, Jan 6, 2011 at 4:56 PM, Heinz Diehl h...@fancy-poultry.org wrote: On 05.12.2010, Milan Broz wrote: It still seems to like dmcrypt with its parallel processing is just trigger to another bug in 37-rc. To come back to this: my 3 systems (XFS filesystem) running the latest dm-crypt-scale-to-multiple-cpus patch from Andi Kleen/Milan Broz have not showed a single problem since 2.6.37-rc6 and above. No corruption any longer, no freezes, nothing. The patch applies cleanly to 2.6.37, too, and runs just fine. I blindly guess that my data corruption problem was related to something else in the 2.6.37-rc series up to -rc4/5. Since this patch is a significant improvement: any chance that it finally gets merged into mainline/stable? Hi Heinz, I've been using this patch since 2.6.37-rc6+ with ext4 and xfs filesystems and haven't seen any corruptions since then (ext4 got fixed since 2.6.37-rc6, xfs showed no problems from the start) http://git.eu.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1449032be17abb69116dbc393f67ceb8bd034f92 (is the actual temporary fix for ext4) Regards Matt -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Various Questions
On Fri 07 January 2011 08:14:17 Hubert Kario wrote: I'd suggest at least mkfs.btrfs -m raid1 -d raid0 /dev/sdc /dev/sdd if you really want raid0 I don't fully understand -m or -d. Why would this make a truer raid0 that with no options? Is it necessary to use fdisk on new drives in creating a BTRFS multi-drive array? Or is this all that's needed: # mkfs.btrfs /dev/sdb /dev/sdc # btrfs filesystem show Is this related to 'subvolumes'? The FAQ implies that a subvolume is like a directory, but also like a partition. What's the rationale for being able to create a subvolume under a subvolume, as Hubert says so he can use the shadow_copy module for samba to publish the snapshots to windows clients. I don't have any windows clients, but what difference does his structure make? I know that if using SATA+LVM, turn off the writeback cache on the drive, as it doesn't do cash flushing, and ensure NCQ is on. But does this also apply to a BTRFS array? If so, is this done in rc.local with hdparm -I /dev/sdb hdparm -I /dev/sdc How do you know what options to rsync are on by default? I can't find this anywhere. For example, it seems to me that --perms -ogE --hard-links and --delete-excluded should be on by default, for a true sync? If using the --numeric-ids switch for rsync, do you just have to manually make sure the IDs and usernames are the same on source and destination machines? For files that fail to transfer, wouldn't it be wise to use --partial-dir=DIR to at least recover part of lost files? The rsync man page says that rsync uses ssh by default, but is that the case? I think -e may be related to engaging ssh, but don't understand the explanation. So for my system where there is a backup server, I guess I run the rsync daemon on the backup server which presents a port, then when the other systems decide it's time for a backup (cron) they: - stop mysql, dump the database somewhere, start mysql; - connect to the backup server's rsync port and dump their data to (hopefully) some specific place there. Right? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Various Questions
On Fri, Jan 7, 2011 at 11:15 AM, Carl Cook cac...@quantum-sci.com wrote: On Fri 07 January 2011 08:14:17 Hubert Kario wrote: I'd suggest at least mkfs.btrfs -m raid1 -d raid0 /dev/sdc /dev/sdd if you really want raid0 I don't fully understand -m or -d. Why would this make a truer raid0 that with no options? this will give you RAID0 for your data, but RAID1 for your metadata, making it less likely that the FS itself gets corrupted, even though you will lose some data in crash cases, if i understand correctly. Is it necessary to use fdisk on new drives in creating a BTRFS multi-drive array? Or is this all that's needed: # mkfs.btrfs /dev/sdb /dev/sdc # btrfs filesystem show depends on whether you need /boot partitions or other partitions. what you have works fine though. Is this related to 'subvolumes'? The FAQ implies that a subvolume is like a directory, but also like a partition. What's the rationale for being able to create a subvolume under a subvolume, as Hubert says so he can use the shadow_copy module for samba to publish the snapshots to windows clients. I don't have any windows clients, but what difference does his structure make? just his preference to put it there... the snapshot of a snapshot can go anywhere. it doesn't have to reside under it's parent, the parent was just used as a base, it's not bound to it in any way AFAIK. How do you know what options to rsync are on by default? I can't find this anywhere. For example, it seems to me that --perms -ogE --hard-links and --delete-excluded should be on by default, for a true sync? the links and command Freddie Cash posted are a really good base to work from. So for my system where there is a backup server, I guess I run the rsync daemon on the backup server which presents a port, then when the other systems decide it's time for a backup (cron) they: - stop mysql, dump the database somewhere, start mysql; - connect to the backup server's rsync port and dump their data to (hopefully) some specific place there. Right? you don't have to stop mysql, you just need to freeze any new, incoming writes, and flush (ie. let finish) whatever is happening right now. this ensures mysql is _internally_ consistent on the disk. see comment by Lloyd Standish here: http://dev.mysql.com/doc/refman/5.1/en/backup-methods.html C Anthony -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Various Questions
On Fri, Jan 7, 2011 at 9:15 AM, Carl Cook cac...@quantum-sci.com wrote: How do you know what options to rsync are on by default? I can't find this anywhere. For example, it seems to me that --perms -ogE --hard-links and --delete-excluded should be on by default, for a true sync? Who cares which ones are on by default? List the ones you want to use on the command-line, everytime. That way, if the defaults change, your setup won't. If using the --numeric-ids switch for rsync, do you just have to manually make sure the IDs and usernames are the same on source and destination machines? You use the --numeric-ids switch so that it *doesn't* matter if the IDs/usernames are the same. It just sends the ID number on the wire. Sure, if you do an ls on the backup box, the username will appear to be messed up. But if you compare the user ID assigned to the file, and the user ID to the backed up etc/passwd file, they are correct. Then, if you ever need to restore the HTPC from backups, the etc/passwd file is transferred over, the user IDs are transferred over, and when you do an ls on the HTPC, everything matches up correctly. For files that fail to transfer, wouldn't it be wise to use --partial-dir=DIR to at least recover part of lost files? Or, just run rsync again, if the connection is dropped. The rsync man page says that rsync uses ssh by default, but is that the case? I think -e may be related to engaging ssh, but don't understand the explanation. Does it matter what the default is, if you specify exactly how you want it to work on the command-line? So for my system where there is a backup server, I guess I run the rsync daemon on the backup server which presents a port, then when the other systems decide it's time for a backup (cron) they: - stop mysql, dump the database somewhere, start mysql; - connect to the backup server's rsync port and dump their data to (hopefully) some specific place there. Right? That's one way (push backups). It works ok for small numbers of systems being backed up. But get above a handful of machines, and it gets very hard to time everything so that you don't hammer the disks on the backup server. Pull backups (backups server does everything) works better, in my experience. Then you just script things up once, run 1 script, worry about 1 schedule, and everything is stored on the backups server. No need to run rsync daemons everywhere, just run the rsync client, using -e ssh, and let it do everything. If you need it to run a script on the remote machine first, that's easy enough to do: - ssh to remote system, run script to stop DBs, dump DBs, snapshot FS, whatever - then run rsync - ssh to remote system run script to start DBs, delete snapshot, whatever You're starting to over-think things. Keep it simple, don't worry about defaults, specify everything you want to do, and do it all from the backups box. -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Various Questions
Wow, this rsync and backup system is pretty amazing. I've always just tarred each directory manually, but now find I can RELIABLY automate backups, and have SOLID versioning to boot. Thanks to everyone who advised, especially Freddie and Anthony. I am still waiting for hardware for my backup server, but have been preparing. On the backup server I'll be doing pull backups for everything except my phone (which is connected intermittently). I'm going to set up a cron script on the backup server to pull backups once a week (as opposed to once/mo which I've done for 12 years). I am at a loss how to to lock the database on the HTPC while exporting the dump, as per Lloyd Standish, but will study it. (Freddie gave a nice script, but it doesn't seem to lock/flush first) Also don't know how to email results/success/fail on completion, as I've not a very good coder. But here is my proposed cron: btrfs subvolume snapshot hex:///home /media/backups/snapshots/hex-{DATE} rsync --archive --hard-links --delete-during --delete-excluded --inplace --numeric-ids -e ssh --exclude-from=/media/backups/exclude-hex hex:///home /media/backups/hex btrfs subvolume snapshot droog:///home /media/backups/snapshots/droog-{DATE} rsync --archive --hard-links --delete-during --delete-excluded --inplace --numeric-ids -e ssh --exclude-from=/media/backups/exclude-droog droog:///home /media/backups/droog My root filesystems are ext4, so I guess they cannot be snapshotted before backup. My home directories are/will be BTRFS though. On Fri 07 January 2011 08:14:17 Hubert Kario wrote: I'd suggest at least mkfs.btrfs -m raid1 -d raid0 /dev/sdc /dev/sdd if you really want raid0 I don't fully understand -m or -d. Why would this make a truer raid0 that with no options? I am beginning to suspect that this is the -default- behavior, as described in the wiki: # Create a filesystem across four drives (metadata mirrored, data striped) Should I turn off the writeback cache on each drive when running BTRFS? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
open_ctree failed, unable to mount the fs
I got a power cycle, after which I'm no longer able to mount btrfs filesystem: device fsid x-y devid 1 transid 169686 /dev/vda3 device fsid x-y devid 1 transid 169686 /dev/vda3 parent transid verify failed on 3260289024 wanted 169686 found 169685 parent transid verify failed on 3260289024 wanted 169686 found 169685 parent transid verify failed on 3260289024 wanted 169686 found 169685 btrfs: open_ctree failed Tried to get that mounted with 2.6.35 and 2.6.37, without success. Is there a way to fix it? -- Tomasz Chmielewski http://wpkg.org -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: open_ctree failed, unable to mount the fs
On Fri, Jan 07, 2011 at 08:01:47PM +0100, Tomasz Chmielewski wrote: I got a power cycle, after which I'm no longer able to mount btrfs filesystem: device fsid x-y devid 1 transid 169686 /dev/vda3 device fsid x-y devid 1 transid 169686 /dev/vda3 parent transid verify failed on 3260289024 wanted 169686 found 169685 parent transid verify failed on 3260289024 wanted 169686 found 169685 parent transid verify failed on 3260289024 wanted 169686 found 169685 btrfs: open_ctree failed Tried to get that mounted with 2.6.35 and 2.6.37, without success. Is there a way to fix it? The forthcoming[1] btrfsck tool should handle that particular error, I believe. To prevent it from happening again, ensure that you have working barriers on your disks, or that you turn off write caching on the drives at every boot. Hugo. [1] out real soon now -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Well, sir, the floor is yours. But remember, the --- roof is ours! signature.asc Description: Digital signature
Re: open_ctree failed, unable to mount the fs
The forthcoming[1] btrfsck tool should handle that particular error, I believe. I noticed a similar problem was discussed here, with a solution: http://www.spinics.net/lists/linux-btrfs/msg07572.html where a btrfs-selects-super was used: git clone git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git next (or git pull into your existing checkout) Then make btrfs-select-super ./btrfs-selects-super -s 1 /dev/xxx After this you'll want to do a full backup and make sure things are working properly. However, I don't see the tool when I clone the latest git - am I missing something? FYI, btrfsck -s 1 /dev/vda3 says: using SB copy 1, bytenr 67108864 fs tree 265 refs 1 not found unresolved ref root 284 dir 115450 index 2 namelen 19 name 2010-10-17-05:15:23 error 600 fs tree 274 refs 1 not found unresolved ref root 284 dir 115450 index 3 namelen 19 name 2010-10-24-05:15:20 error 600 fs tree 277 refs 1 not found unresolved ref root 284 dir 115392 index 20 namelen 19 name 2010-10-26-05:15:21 error 600 fs tree 278 refs 1 not found unresolved ref root 284 dir 115392 index 21 namelen 19 name 2010-10-27-05:15:23 error 600 fs tree 279 refs 1 not found unresolved ref root 284 dir 115392 index 22 namelen 19 name 2010-10-28-05:18:28 error 600 fs tree 280 refs 1 not found unresolved ref root 284 dir 115392 index 23 namelen 19 name 2010-10-29-05:15:21 error 600 fs tree 281 refs 1 not found unresolved ref root 284 dir 115392 index 24 namelen 19 name 2010-10-30-05:15:27 error 600 fs tree 282 refs 1 not found unresolved ref root 284 dir 115450 index 4 namelen 19 name 2010-10-31-05:15:21 error 600 fs tree 283 refs 1 not found unresolved ref root 284 dir 115392 index 25 namelen 19 name 2010-10-31-05:15:21 error 600 fs tree 284 refs 13 unresolved ref root 284 dir 115453 index 2 namelen 19 name 2010-11-01-05:15:26 error 600 unresolved ref root 319 dir 115453 index 2 namelen 19 name 2010-11-01-05:15:26 error 600 unresolved ref root 340 dir 115453 index 2 namelen 19 name 2010-11-01-05:15:26 error 600 unresolved ref root 348 dir 115453 index 2 namelen 19 name 2010-11-01-05:15:26 error 600 unresolved ref root 355 dir 115453 index 2 namelen 19 name 2010-11-01-05:15:26 error 600 unresolved ref root 357 dir 115453 index 2 namelen 19 name 2010-11-01-05:15:26 error 600 unresolved ref root 358 dir 115453 index 2 namelen 19 name 2010-11-01-05:15:26 error 600 unresolved ref root 359 dir 115453 index 2 namelen 19 name 2010-11-01-05:15:26 error 600 unresolved ref root 360 dir 115453 index 2 namelen 19 name 2010-11-01-05:15:26 error 600 unresolved ref root 361 dir 115453 index 2 namelen 19 name 2010-11-01-05:15:26 error 600 unresolved ref root 362 dir 115453 index 2 namelen 19 name 2010-11-01-05:15:26 error 600 unresolved ref root 363 dir 115453 index 2 namelen 19 name 2010-11-01-05:15:26 error 600 fs tree 299 refs 1 not found unresolved ref root 319 dir 115450 index 6 namelen 19 name 2010-11-14-05:15:25 error 600 fs tree 307 refs 1 not found unresolved ref root 319 dir 115450 index 7 namelen 19 name 2010-11-21-05:18:23 error 600 fs tree 312 refs 1 not found unresolved ref root 319 dir 115392 index 50 namelen 19 name 2010-11-25-05:15:25 error 600 fs tree 313 refs 1 not found unresolved ref root 319 dir 115392 index 51 namelen 19 name 2010-11-26-05:15:24 error 600 fs tree 314 refs 1 not found unresolved ref root 319 dir 115392 index 52 namelen 19 name 2010-11-27-05:15:27 error 600 fs tree 315 refs 2 not found unresolved ref root 319 dir 115450 index 8 namelen 19 name 2010-11-28-05:15:22 error 600 unresolved ref root 340 dir 115450 index 8 namelen 19 name 2010-11-28-05:15:22 error 600 fs tree 316 refs 1 not found unresolved ref root 319 dir 115392 index 53 namelen 19 name 2010-11-28-05:15:22 error 600 fs tree 317 refs 1 not found unresolved ref root 319 dir 115392 index 54 namelen 19 name 2010-11-29-05:15:25 error 600 fs tree 318 refs 1 not found unresolved ref root 319 dir 115392 index 55 namelen 19 name 2010-11-30-05:15:24 error 600 fs tree 319 refs 12 unresolved ref root 319 dir 115453 index 3 namelen 19 name 2010-12-01-05:15:29 error 600 unresolved ref root 340 dir 115453 index 3 namelen 19 name 2010-12-01-05:15:29 error 600 unresolved ref root 348 dir 115453 index 3 namelen 19 name 2010-12-01-05:15:29 error 600 unresolved ref root 355 dir 115453 index 3 namelen 19 name 2010-12-01-05:15:29 error 600 unresolved ref root 357 dir 115453 index 3 namelen 19 name 2010-12-01-05:15:29 error 600 unresolved ref root 358 dir 115453 index 3 namelen 19 name 2010-12-01-05:15:29 error 600 unresolved ref root 359 dir 115453 index 3 namelen 19 name 2010-12-01-05:15:29
Re: Atomic file data replace API
Excerpts from Hubert Kario's message of 2011-01-07 11:26:02 -0500: On Friday, January 07, 2011 17:12:11 Chris Mason wrote: Excerpts from Olaf van der Spek's message of 2011-01-07 10:17:31 -0500: On Fri, Jan 7, 2011 at 4:13 PM, Chris Mason chris.ma...@oracle.com wrote: That's not what I asked. ;) I asked to wait until the first write (or close). That way, you don't get unintentional empty files. One step further, you don't have to keep the data in memory, you're free to write them to disk. You just wouldn't update the meta-data (yet). Sorry ;) Picture an application that truncates 1024 files without closing any of them. Basically any operation that includes the kernel waiting for applications because they promise to do something soon is a denial of service attack, or a really easy way to run out of memory on the box. I'm not sure why you would run out of memory in that case. Well, lets make sure I've got a good handle on the proposed interface: 1) fd = open(some_file, O_ATOMIC) 2) truncate(fd, 0) 3) write(fd, new data) The semantics are that we promise not to let the truncate hit the disk until the application does the write. We have a few choices on how we do this: 1) Leave the disk untouched, but keep something in memory that says this inode is really truncated 2) Record on disk that we've done our atomic truncate but it is still pending. We'd need some way to remove or invalidate this record after a crash. 3) Go ahead and do the operation but don't allow the transaction to commit until the write is done. option #1: keep something in memory. Well, any time we have a requirement to pin something in memory until userland decides to do a write, we risk oom. Userland has already a file descriptor allocated (which can fail anyway because of OOM), I see no problem in increasing the size of kernel memory usage by 4 bytes (if not less) just to note that the application wants to see the file as truncated (1 bit) and the next write has to be atomic (2nd bit?). The exact amount of tracking is going to vary. The reason why is that actually doing the truncate is an O(size of the file) operation and so you can't just flip a switch when the write or the close comes in. You have to run through all the metadata of the file and do something temporary with each part that is only completed when the file IO is actually done. Honestly, there many different ways to solve this in the application. Requiring high speed atomic replacement of individual file contents is a recipe for frustration. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: open_ctree failed, unable to mount the fs
On Fri, Jan 7, 2011 at 1:25 PM, Tomasz Chmielewski man...@wpkg.org wrote: The forthcoming[1] btrfsck tool should handle that particular error, I believe. I noticed a similar problem was discussed here, with a solution: http://www.spinics.net/lists/linux-btrfs/msg07572.html where a btrfs-selects-super was used: git clone git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git next (or git pull into your existing checkout) Then make btrfs-select-super ./btrfs-selects-super -s 1 /dev/xxx After this you'll want to do a full backup and make sure things are working properly. However, I don't see the tool when I clone the latest git - am I missing something? It's not built by the makefile by default; make btrfs-select-super as stated above will make it. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Atomic file data replace API
Olaf van der Spek wrote: On Fri, Jan 7, 2011 at 5:32 PM, Massimo Maggi mass...@.it wrote: Are you suggesting to do: 1)fopen with O_TRUNC, O_ATOMIC: returns fd to a temporary file 2)application writes to that fd, with one or more system calls, in a short time or in long time, at his will. 3)at fclose (or even at fsync ) atomically swap data pointer of real file with temp file, then delete temp.In a transparent mode to userland. (something similar to e4defrag). Is this sum up correct? Almost. Swap should probably not be done at fsync time. Other open references (for example running executables) should be swapped too. What is the visibility of the changes for other processes supposed to be in the meantime? I.e., if things happen in this order: 1. Process A does fda = open(foo.txt, O_TRUNC|O_ATOMIC) 2. Process B does fdb = open(foo.txt, O_RDONLY) 3. B does read(fdb, buf, 4096) 4. A does write(fda, NEW DATA\n, 9) 5. Process C comes in and does fdc = open(foo.txt, O_RDONLY) 6. C does read(fdc, buf, 4096) 7. A calls close(fda) Does B see an empty file, or does it see the old contents of the file? Does C see NEW DATA\n, or does it see the old contents of the file, or perhaps an empty file? /Bellman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: open_ctree failed, unable to mount the fs
On 07.01.2011 20:46, cwillu wrote: However, I don't see the tool when I clone the latest git - am I missing something? It's not built by the makefile by default; make btrfs-select-super as stated above will make it. $ grep select Makefile $ grep super Makefile $ grep -r select-super * $ grep -r selects-super * I used: git clone git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git next -- Tomasz Chmielewski http://wpkg.org -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'open_ctree failed', unable to mount the fs
On Fri, January 7, 2011 2:09 pm, Hugo Mills wrote: On Fri, Jan 07, 2011 at 08:01:47PM +0100, Tomasz Chmielewski wrote: I got a power cycle, after which I'm no longer able to mount btrfs filesystem: [...] The forthcoming[1] btrfsck tool should handle that particular error, I believe. I tried it with the btrfsck in the git repo (last week), and wound up with... a brandy-new, blank btrfs partition. Not *quite* what I was looking for. But at least it mounted. -K -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: open_ctree failed, unable to mount the fs
On Fri, Jan 7, 2011 at 2:01 PM, Tomasz Chmielewski man...@wpkg.org wrote: On 07.01.2011 20:46, cwillu wrote: However, I don't see the tool when I clone the latest git - am I missing something? It's not built by the makefile by default; make btrfs-select-super as stated above will make it. $ grep select Makefile $ grep super Makefile $ grep -r select-super * $ grep -r selects-super * I used: git clone git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git next You checked out the master branch into a folder called next. -b next is the option to checkout a specific branch. From your existing checkout, git checkout -t origin/next will switch to that branch. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'open_ctree failed', unable to mount the fs
On Fri, Jan 7, 2011 at 2:02 PM, Ken D'Ambrosio k...@jots.org wrote: On Fri, January 7, 2011 2:09 pm, Hugo Mills wrote: On Fri, Jan 07, 2011 at 08:01:47PM +0100, Tomasz Chmielewski wrote: I got a power cycle, after which I'm no longer able to mount btrfs filesystem: [...] The forthcoming[1] btrfsck tool should handle that particular error, I believe. I tried it with the btrfsck in the git repo (last week), and wound up with... a brandy-new, blank btrfs partition. Not *quite* what I was looking for. But at least it mounted. btrfsck in git hasn't been updated since October; the upcoming fsck work isn't public, presumably to avoid making things worse until it's working right. The current btrfsck doesn't write to the partition as far as I'm aware. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: open_ctree failed, unable to mount the fs
On 07.01.2011 21:18, cwillu wrote: You checked out the master branch into a folder called next. -b next is the option to checkout a specific branch. From your existing checkout, git checkout -t origin/next will switch to that branch. Good catch - thanks for a hint. The filesystem mounted and was usable. -- Tomasz Chmielewski http://wpkg.org -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfsck segmentation fault
I have a 10TB btrfs filesystem over iSCSI that is currently unmountable. I'm currently running Fedora 13 with a recent Fedora 14 kernel (2.6.35.9-64.fc14.i686.PAE) and the system hung with messages like : parent transid verify failed on 5937615339520 wanted 48547 found 48542 I've rebooted and and am attempting to recover with btrfsck from the btrfs-progs-unstable git tree, but it is segfaulting after finding a superblock and listing out 3 of the parent transid messages. Anyone have any ideas? I tried btrfsck /dev/sdb, btrfsck -s 1 /dev/sdb, and btrfsck -s 2 /dev/sdb with the same result for each. The btrfsck binary I compiled does work on a small (800MB) test btrfs file system. I suspect it may be due to the size of the filesystem I am trying to repair. Running btrfsck with gdb returns : #0 find_first_block_group (root=0x8067178, path=0x80677f8, key=0xb24b) at extent-tree.c:3028 #1 0x08055603 in btrfs_read_block_groups (root=0x8067178) at extent-tree.c:3072 #2 0x08053009 in open_ctree_fd (fp=7, path=0xb63a /dev/sdb, sb_bytenr=value optimized out, writes=0) at disk-io.c:760 #3 0x080530e8 in open_ctree (filename=0xb63a /dev/sdb, sb_bytenr=0, writes=0) at disk-io.c:587 #4 0x0804d3fc in main (ac=value optimized out, av=Cannot access memory at address 0x4 In any event, recovering the data would be nice and any ideas to do so would be appreciated. -- Andrew Schretter Systems Programmer, Duke University Dept. of Mathematics (919) 660-2866 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfsck segmentation fault
On Fri, Jan 7, 2011 at 3:15 PM, Andrew Schretter schr...@math.duke.edu wrote: I have a 10TB btrfs filesystem over iSCSI that is currently unmountable. I'm currently running Fedora 13 with a recent Fedora 14 kernel (2.6.35.9-64.fc14.i686.PAE) and the system hung with messages like : parent transid verify failed on 5937615339520 wanted 48547 found 48542 I've rebooted and and am attempting to recover with btrfsck from the btrfs-progs-unstable git tree, but it is segfaulting after finding a superblock and listing out 3 of the parent transid messages. Anyone have any ideas? I tried btrfsck /dev/sdb, btrfsck -s 1 /dev/sdb, and btrfsck -s 2 /dev/sdb with the same result for each. The btrfsck binary I compiled does work on a small (800MB) test btrfs file system. I suspect it may be due to the size of the filesystem I am trying to repair. Segfaulting is what the current btrfsck does when it finds a problem; it doesn't try to fix anything yet. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Offline Deduplication for Btrfs
On Thursday, January 06, 2011 01:35:15 pm Chris Mason wrote: What is the smallest granularity that the datadomain searches for in terms of dedup? Josef's current setup isn't restricted to a specific block size, but there is a min match of 4k. I talked to a few people I know and didn't get a clear answer either... However, 512 bytes came up more than once. I'm not really worried about the size of region to be used, but about offsetting it... its so easy to create large tars, ... where the content is offset by a few bytes, mutliples of 512 and such. Peter. -- Censorship: noun, circa 1591. a: Relief of the burden of independent thinking. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Atomic file data replace API
On 01/07/2011 09:58 AM, Chris Mason wrote: Yes and no. We have a best effort mechanism where we try to guess that since you've done this truncate and the write that you want the writes to show up quickly. But its a guess. It is a pretty good guess, and one that the NT kernel has been making for 15 years or so. I've been following this issue for some time and I still don't understand why Ted is so hostile to this and can't make it work right on ext4. When you get a rename() you just need to check if there are outstanding journal transactions and/or dirty cache pages, and hang the rename() transaction on the end of those. That way if the system crashes after the new file has fully hit the disk, the old file is gone and you only have the new one, but if it crashes before, you still have the old one in place. Both the writes and the rename can be delayed in the cache to an arbitrary point in the future; what matters is that their order is preserved. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html