[patch 5/6] reiserfs v3 patches
reiserfs: journal_transaction_should_end should increase the count of blocks allocated so the transaction subsystem can keep new writers from creating a transaction that is too large. diff -r 890bf922a629 fs/reiserfs/journal.c --- a/fs/reiserfs/journal.c Fri Jan 13 14:00:50 2006 -0500 +++ b/fs/reiserfs/journal.c Fri Jan 13 14:01:36 2006 -0500 @@ -2854,6 +2854,9 @@ int journal_transaction_should_end(struc journal->j_cnode_free < (journal->j_trans_max * 3)) { return 1; } + /* protected by the BKL here */ + journal->j_len_alloc += new_alloc; + th->t_blocks_allocated += new_alloc ; return 0; } --
[patch 6/6] reiserfs v3 patches
When a filesystem has been converted from 3.5.x to 3.6.x, we need an extra check during file write to make sure we are not trying to make a 3.5.x file > 2GB. diff -r ee81eb208598 fs/reiserfs/file.c --- a/fs/reiserfs/file.cFri Jan 13 14:01:37 2006 -0500 +++ b/fs/reiserfs/file.cFri Jan 13 14:08:12 2006 -0500 @@ -1285,6 +1285,23 @@ static ssize_t reiserfs_file_write(struc struct reiserfs_transaction_handle th; th.t_trans_id = 0; + /* If a filesystem is converted from 3.5 to 3.6, we'll have v3.5 items + * lying around (most of the disk, in fact). Despite the filesystem + * now being a v3.6 format, the old items still can't support large + * file sizes. Catch this case here, as the rest of the VFS layer is + * oblivious to the different limitations between old and new items. + * reiserfs_setattr catches this for truncates. This chunk is lifted + * from generic_write_checks. */ + if (get_inode_item_key_version (inode) == KEY_FORMAT_3_5 && + *ppos + count > MAX_NON_LFS) { + if (*ppos >= MAX_NON_LFS) { + send_sig(SIGXFSZ, current, 0); + return -EFBIG; + } + if (count > MAX_NON_LFS - (unsigned long)*ppos) + count = MAX_NON_LFS - (unsigned long)*ppos; + } + if (file->f_flags & O_DIRECT) { // Direct IO needs treatment ssize_t result, after_file_end = 0; if ((*ppos + count >= inode->i_size) --
[patch 3/6] reiserfs v3 patches
In data=journal mode, reiserfs writepage needs to make sure not to trigger transactions while being run under PF_MEMALLOC. This patch makes sure to redirty the page instead of forcing a transaction start in this case. Also, calling filemap_fdata* in order to trigger io on the block device can cause lock inversions on the page lock. Instead, do simple batching from flush_commit_list. diff -r c10585019f18 fs/reiserfs/inode.c --- a/fs/reiserfs/inode.c Fri Jan 13 13:51:10 2006 -0500 +++ b/fs/reiserfs/inode.c Fri Jan 13 13:55:09 2006 -0500 @@ -2363,6 +2363,13 @@ static int reiserfs_write_full_page(stru int bh_per_page = PAGE_CACHE_SIZE / s->s_blocksize; th.t_trans_id = 0; + /* no logging allowed when nonblocking or from PF_MEMALLOC */ + if (checked && (current->flags & PF_MEMALLOC)) { + redirty_page_for_writepage(wbc, page); + unlock_page(page); + return 0; + } + /* The page dirty bit is cleared before writepage is called, which * means we have to tell create_empty_buffers to make dirty buffers * The page really should be up to date at this point, so tossing diff -r c10585019f18 fs/reiserfs/journal.c --- a/fs/reiserfs/journal.c Fri Jan 13 13:51:10 2006 -0500 +++ b/fs/reiserfs/journal.c Fri Jan 13 13:55:09 2006 -0500 @@ -990,6 +990,7 @@ static int flush_commit_list(struct supe struct reiserfs_journal *journal = SB_JOURNAL(s); int barrier = 0; int retval = 0; + int write_len; reiserfs_check_lock_depth(s, "flush_commit_list"); @@ -1039,16 +1040,24 @@ static int flush_commit_list(struct supe BUG_ON(!list_empty(&jl->j_bh_list)); /* * for the description block and all the log blocks, submit any buffers -* that haven't already reached the disk +* that haven't already reached the disk. Try to write at least 256 +* log blocks. later on, we will only wait on blocks that correspond +* to this transaction, but while we're unplugging we might as well +* get a chunk of data on there. */ atomic_inc(&journal->j_async_throttle); - for (i = 0; i < (jl->j_len + 1); i++) { + write_len = jl->j_len + 1; + if (write_len < 256) + write_len = 256; + for (i = 0 ; i < write_len ; i++) { bn = SB_ONDISK_JOURNAL_1st_BLOCK(s) + (jl->j_start + i) % SB_ONDISK_JOURNAL_SIZE(s); tbh = journal_find_get_block(s, bn); - if (buffer_dirty(tbh)) /* redundant, ll_rw_block() checks */ - ll_rw_block(SWRITE, 1, &tbh); - put_bh(tbh); + if (tbh) { + if (buffer_dirty(tbh)) + ll_rw_block(WRITE, 1, &tbh) ; + put_bh(tbh) ; + } } atomic_dec(&journal->j_async_throttle); --
[patch 4/6] reiserfs v3 patches
write_ordered_buffers should handle dirty non-uptodate buffers without a BUG() diff -r 18fa5554d7e2 fs/reiserfs/journal.c --- a/fs/reiserfs/journal.c Fri Jan 13 13:55:10 2006 -0500 +++ b/fs/reiserfs/journal.c Fri Jan 13 14:00:49 2006 -0500 @@ -848,6 +848,14 @@ static int write_ordered_buffers(spinloc spin_lock(lock); goto loop_next; } + /* in theory, dirty non-uptodate buffers should never get here, +* but the upper layer io error paths still have a few quirks. +* Handle them here as gracefully as we can +*/ + if (!buffer_uptodate(bh) && buffer_dirty(bh)) { + clear_buffer_dirty(bh); + ret = -EIO; + } if (buffer_dirty(bh)) { list_del_init(&jh->list); list_add(&jh->list, &tmp); @@ -1032,9 +1040,12 @@ static int flush_commit_list(struct supe } if (!list_empty(&jl->j_bh_list)) { + int ret; unlock_kernel(); - write_ordered_buffers(&journal->j_dirty_buffers_lock, - journal, jl, &jl->j_bh_list); + ret = write_ordered_buffers(&journal->j_dirty_buffers_lock, + journal, jl, &jl->j_bh_list); + if (ret < 0 && retval == 0) + retval = ret; lock_kernel(); } BUG_ON(!list_empty(&jl->j_bh_list)); --
[patch 2/6] reiserfs v3 patches
The b_private field in buffer heads needs to be zero filled when the buffers are allocated. Thanks to Nathan Scott for finding this. It was causing problems on systems with both XFS and reiserfs. diff -r 5ef1fa0a021a fs/buffer.c --- a/fs/buffer.c Fri Jan 13 13:50:39 2006 -0500 +++ b/fs/buffer.c Fri Jan 13 13:51:09 2006 -0500 @@ -1022,6 +1022,7 @@ try_again: bh->b_state = 0; atomic_set(&bh->b_count, 0); + bh->b_private = NULL; bh->b_size = size; /* Link the buffer to its page */ --
[patch 1/6] reiserfs v3 patches
After a transaction has closed but before it has finished commit, there is a window where data=ordered mode requires invalidatepage to pin pages instead of freeing them. This patch fixes a race between the invalidatepage checks and data=ordered writeback, and it also adds a check to the reiserfs write_ordered_buffers routines to write any anonymous buffers that were dirtied after its first writeback loop. That bug works like this: proc1: transaction closes and a new one starts proc1: write_ordered_buffers starts processing data=ordered list proc1: buffer A is cleaned and written proc2: buffer A is dirtied by another process proc2: File is truncated to zero, page A goes through invalidatepage proc2: reiserfs_invalidatepage sees dirty buffer A with reiserfs journal head, pins it proc1: write_ordered_buffers frees the journal head on buffer A At this point, buffer A stays dirty forever diff -r 21be96fa294a fs/reiserfs/inode.c --- a/fs/reiserfs/inode.c Fri Jan 13 13:48:03 2006 -0500 +++ b/fs/reiserfs/inode.c Fri Jan 13 13:50:37 2006 -0500 @@ -2743,6 +2743,7 @@ static int invalidatepage_can_drop(struc int ret = 1; struct reiserfs_journal *j = SB_JOURNAL(inode->i_sb); + lock_buffer(bh); spin_lock(&j->j_dirty_buffers_lock); if (!buffer_mapped(bh)) { goto free_jh; @@ -2758,7 +2759,7 @@ static int invalidatepage_can_drop(struc if (buffer_journaled(bh) || buffer_journal_dirty(bh)) { ret = 0; } - } else if (buffer_dirty(bh) || buffer_locked(bh)) { + } else if (buffer_dirty(bh)) { struct reiserfs_journal_list *jl; struct reiserfs_jh *jh = bh->b_private; @@ -2784,6 +2785,7 @@ static int invalidatepage_can_drop(struc reiserfs_free_jh(bh); } spin_unlock(&j->j_dirty_buffers_lock); + unlock_buffer(bh); return ret; } diff -r 21be96fa294a fs/reiserfs/journal.c --- a/fs/reiserfs/journal.c Fri Jan 13 13:48:03 2006 -0500 +++ b/fs/reiserfs/journal.c Fri Jan 13 13:50:37 2006 -0500 @@ -878,6 +878,19 @@ static int write_ordered_buffers(spinloc } if (!buffer_uptodate(bh)) { ret = -EIO; + } + /* ugly interaction with invalidatepage here. +* reiserfs_invalidate_page will pin any buffer that has a valid +* journal head from an older transaction. If someone else sets +* our buffer dirty after we write it in the first loop, and +* then someone truncates the page away, nobody will ever write +* the buffer. We're safe if we write the page one last time +* after freeing the journal header. +*/ + if (buffer_dirty(bh) && unlikely(bh->b_page->mapping == NULL)) { + spin_unlock(lock); + ll_rw_block(WRITE, 1, &bh); + spin_lock(lock); } put_bh(bh); cond_resched_lock(lock); --
[patch 0/6] reiserfs v3 patches
Hello everyone, Here is my current queue of reiserfs patches. These originated from various bugs solved in the suse sles9 kernel, and have been ported to 2.6.15-git9. -chris --
Re: Data being corrupted on reiserfs 3.6
Thanks, Pierre Etchemaïté wrote: Le Sun, 15 Jan 2006 21:36:20 +, Michael Barnwell <[EMAIL PROTECTED]> a écrit : I'm not sure how to search through a binary file for non-zero bytes. cmp -b ? [EMAIL PROTECTED]:~$ cmp -b /tmp/1GB.tst /home/michael/1GB.tst /tmp/1GB.tst /home/michael/1GB.tst differ: byte 68494094, line 1 is 0 ^@ 40 That seems to stop after the first difference, so I did: - [EMAIL PROTECTED]:~$ cmp -bl /tmp/1GB.tst /home/michael/1GB.tst | wc -l 243 The full output of cmp -bl is at http://pastebin.com/507389 Regards, Michael.
Re: Data being corrupted on reiserfs 3.6
Le Sun, 15 Jan 2006 21:36:20 +, Michael Barnwell <[EMAIL PROTECTED]> a écrit : > I'm not sure how to search > through a binary file for non-zero bytes. cmp -b ?
Re: Data being corrupted on reiserfs 3.6
Hi, Jan Kara wrote: Hmm, that is really strange. Do the files have the same size? Do you get an error also if you just create file full of zeros? If so, how do the differences look like (e.g. any signs of flipped bits or so?). [EMAIL PROTECTED]:/tmp$ dd bs=1024 count=1000k if=/dev/zero of=./1GB.tst 1024000+0 records in 1024000+0 records out 1048576000 bytes transferred in 61.578769 seconds (17028207 bytes/sec) [EMAIL PROTECTED]:/tmp$ ls -l 1GB.tst -rw-r--r-- 1 michael michael 1048576000 2006-01-15 20:51 1GB.tst [EMAIL PROTECTED]:/tmp$ md5sum 1GB.tst e5c834fbdaa6bfd8eac5eb9404eefdd4 1GB.tst [EMAIL PROTECTED]:/tmp$ ls -l /home/michael/1GB.tst -rw-r--r-- 1 michael michael 1048576000 2006-01-15 20:54 /home/michael/1GB.tst [EMAIL PROTECTED]:/tmp$ md5sum /home/michael/1GB.tst 92c51557041ebd6424b4467a878c9f44 /home/michael/1GB.tst I looked at the file in /home/michael/1GB.tst with xdd for about 5 minutes but couldn't see anything but zeros - I'm not sure how to search through a binary file for non-zero bytes. So yes, error if the file is all zeros and they have the same size. Thanks, Michael Barnwell.
Re: Data being corrupted on reiserfs 3.6
Hello, > I'm experiencing data corruption when creating or copy data to my > reiserfs 3.6 partition mounted under /home. The following extract gives > a pretty clear indication that it's getting corrupted somewhere. > > [EMAIL PROTECTED]:/tmp$ mount > /dev/md0 on / type ext3 (rw,errors=remount-ro) > proc on /proc type proc (rw) > sysfs on /sys type sysfs (rw) > devpts on /dev/pts type devpts (rw,gid=5,mode=620) > tmpfs on /dev/shm type tmpfs (rw) > usbfs on /proc/bus/usb type usbfs (rw) > tmpfs on /dev type tmpfs (rw,size=10M,mode=0755) > /dev/md2 on /home type reiserfs (rw) > > [EMAIL PROTECTED]:/tmp$ dd bs=1024 count=1000k if=/dev/urandom of=./1GB.tst > 1024000+0 records in > 1024000+0 records out > 1048576000 bytes transferred in 231.749782 seconds (4524604 bytes/sec) > > [EMAIL PROTECTED]:/tmp$ md5sum 1GB.tst > 48f46744c7e50c42c061a00d11541a85 1GB.tst > > [EMAIL PROTECTED]:/tmp$ cp 1GB.tst /home/michael/ > > [EMAIL PROTECTED]:/tmp$ md5sum /home/michael/1GB.tst > 042d8c462882f848412679e3cea03fe2 /home/michael/1GB.tst Hmm, that is really strange. Do the files have the same size? Do you get an error also if you just create file full of zeros? If so, how do the differences look like (e.g. any signs of flipped bits or so?). > I'm running Debian Sarge on an Athlon XP 2200+, /dev/md2 is made up of > four 400GB SATA hard disks on a Silicon Image 3114 controller in RAID 5. > Dmesg is showing no errors what so ever, the RAID array has been stable > since I installed it a couple of weeks ago and the drive was formatted > with mkfs.reiserfs with no special options. > > [EMAIL PROTECTED]:/tmp$ uname -a > Linux biggs 2.6.8-2-k7 #1 Tue Aug 16 14:00:15 UTC 2005 i686 GNU/Linux Any chance of trying some newer kernel? 2.6.8 is really old... Honza
Re: reiserfsck --rebuild-tree aborts at same block
> I have a situation where if I run "reiserfsck --rebuild-tree" multiple > times, it always aborts at the same block. The output includes > "Send us the bug report only if the second run dies at the same place > with the same block number." > > Before sending a bunch of info to the wrong place though, could someone > please confirm if I should submit details here as a bug report, or would > this be something to go through the support channel with? First check that you have the latest version of reiserfsck. If so, then this is the appropriate list for the report. Honza
reiserfsck --rebuild-tree aborts at same block
I have a situation where if I run "reiserfsck --rebuild-tree" multiple times, it always aborts at the same block. The output includes "Send us the bug report only if the second run dies at the same place with the same block number." Before sending a bunch of info to the wrong place though, could someone please confirm if I should submit details here as a bug report, or would this be something to go through the support channel with? Thanks