RE: Error messages.
Originally, I was -W0 with fsync(2) being used to insure data integrity. I'm presently testing lk 2.4.19 + Namesys patches 1 thru 13 + Chris Mason's write barrier patch with hdparm -W1 and fsync(2). Under this configuration I don't see the problem you are encountering, but am investigating data coruption on the ReiserFS partitions. -Original Message- From: Anders Widman [mailto:[EMAIL PROTECTED] Sent: Thursday, March 06, 2003 9:20 AM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: Error messages. > Anders, here is what I have and it works on thousands of duplicate > servers: > Tyan S2420 with 1.0GHz PIII > 512MB RAM > Promise PDC20269 in PCI1 Using PDC20268 > Intel Dual 10/100 NIC in PCI2 > Four Maxtor 250GB IDE drives off of the Promise controller > lk 2.4.19 on RH7.3 > hdparm -a64 -K1 -W1 -u1 -m16 -c1 -d1 /dev/hd hm.. The big difference I see is -that I normally use -c3.
RE: Error messages.
Cable length is similar to mine. The PDC20268 will only go to UDMA 5. I haven't done any testing with this controller, needed the PDC20269's UDMA 6 capability. -Original Message- From: Anders Widman [mailto:[EMAIL PROTECTED] Sent: Thursday, March 06, 2003 8:52 AM To: [EMAIL PROTECTED] Subject: Re: Error messages. > That's rather puzzling... I did not have the same problems with the > mii driver; however, I was unable to run the full extent of the 250GB drive > or the UDMA level 6 with mii under 2.4.13, so I was using a special patched > driver form Promise to support both the pdc20269 and 48LBA. In 2.4.19 the 48 > LBA was added so I was able to get the full address range on the 250GB > drives without patches from Promise; however, was still unable to run UDMA > level 6 on the onboard Intel chip. UDMA6 works on the machine with the VIA KT400 chip and 2.4.21 kernel. The other machines are limited to ATA-100 as the controllers does not support higher. Actually I do not need high DMA, DMA-33 should be enough. Though, the errors come even with DMA turned off. It seem though, at least so far, that the system crashes/lockups come much more often with DMA than without. > I still use the Promise pdc20269 and run UDMA level 6 on thousands > of deployed servers at this time. What is the cable length from drives to > controller? Eventhough you have several configured servers, I have thousands > without the problem you are seeing. Yes, I do get an occasional status error > under heavy loads but they've always been recoverable and the systems > continue to chug along. Cables are between 40-45cm / 15,5-17in. PGP public key: https://tnonline.net/secure/pgp_key.txt
RE: Error messages.
Anders, here is what I have and it works on thousands of duplicate servers: Tyan S2420 with 1.0GHz PIII 512MB RAM Promise PDC20269 in PCI1 Intel Dual 10/100 NIC in PCI2 Four Maxtor 250GB IDE drives off of the Promise controller lk 2.4.19 on RH7.3 hdparm -a64 -K1 -W1 -u1 -m16 -c1 -d1 /dev/hd Regards, Wayne. -Original Message- From: Anders Widman [mailto:[EMAIL PROTECTED] Sent: Thursday, March 06, 2003 3:46 AM To: [EMAIL PROTECTED] Subject: Re: Error messages. > On Wed, 2003-03-05 at 21:51, Anders Widman wrote: >> > On Wed, Mar 05, 2003 at 08:18:18PM +0100, Anders Widman wrote: >>New Promise controllers >>PDC20268 (Ultra 100Tx2) > does that mean you only tested on these pdc's ? I changed from Three Ultra100 to Ultra100Tx2. Now I only use two boards in this particular system. > If so then then drop this damn PDC controller and get one that is > supported under linux (e.g. hpt370 based controllers). > I had the very same problems with these PDC20268 controllers. When I > switched to anything above MDMA0 (note not even UDMA) the system was > freezing from time to time. This happens here too... > On the internal controller your drives should work all fine (via/intel > chipsets work nicely), also on hpt based chipsets and also cmd is > supporting linux... but forget about promise. This company just does not > support linux. > I was using kernels 2.4.19/20/21pre1/21pre4/21pre4-ac5 and all had the > very same problem. When I heard from others that they had problems with > promise I switched... and I am now enjoying a rock stable system. It might just have to come to this, but I do not want to buy new hardware :) > Soeren. PGP public key: https://tnonline.net/secure/pgp_key.txt
RE: Error messages.
That's rather puzzling... I did not have the same problems with the mii driver; however, I was unable to run the full extent of the 250GB drive or the UDMA level 6 with mii under 2.4.13, so I was using a special patched driver form Promise to support both the pdc20269 and 48LBA. In 2.4.19 the 48 LBA was added so I was able to get the full address range on the 250GB drives without patches from Promise; however, was still unable to run UDMA level 6 on the onboard Intel chip. I still use the Promise pdc20269 and run UDMA level 6 on thousands of deployed servers at this time. What is the cable length from drives to controller? Eventhough you have several configured servers, I have thousands without the problem you are seeing. Yes, I do get an occasional status error under heavy loads but they've always been recoverable and the systems continue to chug along. -Original Message- From: Anders Widman [mailto:[EMAIL PROTECTED] Sent: Thursday, March 06, 2003 3:44 AM To: [EMAIL PROTECTED] Subject: Re: Error messages. > Hello! > On Thu, Mar 06, 2003 at 09:32:38AM +0100, Anders Widman wrote: >> > And for this case I am sure this was a scratchy CD-ROM disk in my CD-ROM drive. >>Well, have no CD-ROM. :) > /dev/hdg is one of my CD-ROMs ;) >> > Probably same stuff can be get when drive is busy remapping bad sectors? >> > Use smartctl to find out how these messages corellate with remapped bad sectors counts? >> Very strange. Would mean all of my harddrives would be broken, or on >> their way to get broken. I do not believe that. Most of the >> hardware, including the cabling has been replaced and changed. > Well, seems as Wayne have noticed, you have one common part: > Promise controllers. How about using different kind of controller > on one of the boxes and see if it helps? Perhaps, but the same happens on the internal controller. In fact, the internal controller (either VIA or the Intel) causes the system to freeze when it happens to many times. > Bye, > Oleg PGP public key: https://tnonline.net/secure/pgp_key.txt
RE: reiserfsprogs 3.6.5-pre2 release.
Vitaly, how does "check-followed-by-fixable" in 3.6.3 compare to "-a" at boot in 3.6.5? Regards, Wayne. -Original Message- From: Vitaly Fertman [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 25, 2003 8:50 AM To: [EMAIL PROTECTED] Subject: reiserfsprogs 3.6.5-pre2 release. Hi all! this new pre-release includes: - a critical bug on pass0 of rebuild-tree with overflowing while checking unformatted item offsets was fixed. - a bug in relocation of shared object id - entry key sometimes does not get updated correctly with the new key - was fixed. - a bug in bitops operations for be mashins was fixed. - a bug with the superblock overwriting during replaying was fixed. - while openning the journal check that journal parameters in the superblock and in the journal header mathches; advice to run rebuild-sb if not. While rebuilding the superblock, do the same check and ask the user if he wants to rebuild the journal header or continue w/out the journal or he wants to change the start of the partition before using reiserfsck. - check that all not valid bits of the bitmap are set to 1, set it correctly. - fix-fixable does not relocate shared object ids anymore, as it is too complex for fix-fixable and only rebuild-tree does. - reiserfsck -a (started at boot) replays journal, checks error flags in the superblock, bitmaps, fs size, 2 levels of internal tree and switches to fixble mode if any problem is detected. For root fs fixable cannot be performed (as fs is mounted) and just --check will be done. - Journal replay was improved a) check blocks if they could be journable before replaying; b) replay only transactions which has trans_id == last replayed transaction trans_id + 1. - warning messages were improved. -- Thanks, Vitaly Fertman
[reiserfs-list] Data Shredding on a Journal Filesystem
Hello fellow ReiserFS fans. I'm in search of a data shredder for use on reiserfs and am wondering if anyone knows of one. It would need to effectively remove any trace both in the journal and on the disk itself any and all data pertaining to a file. I'm not sure, but I thought at one time Hans was talking about something this himself. I have looked at shred() but it does not work with journalling filesystems. Most appreciatively, Wayne
RE: [reiserfs-list] RE: lk 2.4.19 ReiserFS Build
Oleg, does this include the "speedup" series both you and Chris have been working on? Regards, Wayne. -Original Message- From: Oleg Drokin [mailto:[EMAIL PROTECTED]] Sent: Tuesday, August 13, 2002 1:34 AM To: Manuel Krause Cc: reiserfs-list Subject: Re: [reiserfs-list] RE: lk 2.4.19 ReiserFS Build Hello! On Tue, Aug 13, 2002 at 09:17:06AM +0400, Oleg Drokin wrote: > > On 08/12/2002 08:21 PM, [EMAIL PROTECTED] wrote: > > > Thanks Chris. Should I pull the 2.4.20-pre series from Namesys? > > ^^^ > > > > > this special Namesys series > bk://thebsh.namesys.com/bk/reiser3-linux-2.4 > But it won't last there for long since it was not accepted by Marcelo it seems. Oh, wait, it was indeed accepted by Marcelo, as I see now. So you can pull it from Marcelo's tree as well. I am still going to port it to 2.5, though. Bye, Oleg
RE: [reiserfs-list] RE: lk 2.4.19 ReiserFS Build
Hi Manuel. That was one of the problems I was hoping to find an answer for, but it appears, as Oleg has mentioned, the patches are being tested in 2.5 first before Marcelo will accept them in 2.4. Regards, Wayne. -Original Message- From: Manuel Krause [mailto:[EMAIL PROTECTED]] Sent: Tuesday, August 13, 2002 12:06 AM To: [EMAIL PROTECTED] Cc: reiserfs-list Subject: Re: [reiserfs-list] RE: lk 2.4.19 ReiserFS Build Hi Wayne! Please, let me immediately know when you've received this mail where you've found: On 08/12/2002 08:21 PM, [EMAIL PROTECTED] wrote: > Thanks Chris. Should I pull the 2.4.20-pre series from Namesys? ^^^ > this special Namesys series > These were to be the speedup series, if memory serves me correctly. I'd like > to get the best performance possible for our next production run. > Wayne. > [snip] Thank you in advance, ;-) Sorry, Wayne, I know what you mean. Your previously reviewed patchset from one of my latest mails works fine on here for me until now (but,including the not mentionned rml-preempt-patch for the latest -rc- and for non-server-usage), best regards, Manuel
RE: [reiserfs-list] fsync() Performance Issue
I'll add the write caching into the test just for info. Until there is a way to guaranty the data is safe I'll have to go with no write caching though. I should have all this testing done by the end of the week. -Original Message- From: Chris Mason [mailto:[EMAIL PROTECTED]] Sent: Friday, May 03, 2002 6:00 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: RE: [reiserfs-list] fsync() Performance Issue On Fri, 2002-05-03 at 16:35, [EMAIL PROTECTED] wrote: > Chris, I have some quick preliminary results for you. I have > additional testing to perform and haven't run debugreiserfs() yet. If you > have a preference for which tests to run debugreiserfs() let me know. > Base testing was done against 2.4.13 built on RH 7.1 using the > test_writes.c code I forwarded to you. The system is a Tyan with single > PIII, IDE Promise 20269, Maxtor 160GB drive - write cache disabled. All > numbers are with fsync() and 1KB files. As I said, more testing, i.e. > filesizes, need to be performed. > 2.4.19-pre7 speedup, data logging, write barrier / no options > => 47.1ms/file Hi Wayne, thanks for sending these along. I expected a slight improvement over the 2.4.13 code even with the data logging turned off. I'm curious to see how it does with the IDE cache turned on. With scsi, I see 10-15% better without any options than an unpatched kernel. > 2.4.19-pre7 speedup, data logging, write barrier / data=journal > => 25.2ms/file > 2.4.19-pre7 speedup, data logging, write barrier / data=journal,barrier=none > => 27.8ms/file The barrier option doesn't make much difference because the write cache is off. With write cache on, the barrier code should allow you to be faster than with the caching off, but without risking the data (Jens and I are working on final fsync safety issues though). Hans, data=journal turns on the data journaling. The data journaling patches also include optimizations to write metadata back to disk in bigger chunks for tiny transactions (the current method is to write one transaction's worth back, when a transaction has 3 blocks, this is pretty slow). I've put these patches up on: ftp.suse.com/pub/people/mason/patches/data-logging > One question is will these patches be going into the 2.4 tree and > when? The data logging patches are a huge change, but the good news is they are based on the nesting patches that have been stable for a long time in the quota code. I'll probably want a month or more of heavy testing before I think about submitting them. -chris
RE: [reiserfs-list] fsync() Performance Issue
Thanks. I'll start putting this one into test. Wayne. -Original Message- From: Chris Mason [mailto:[EMAIL PROTECTED]] Sent: Tuesday, April 30, 2002 10:28 AM To: Oleg Drokin Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: [reiserfs-list] fsync() Performance Issue On Tue, 2002-04-30 at 10:20, Oleg Drokin wrote: > Attached is a speedup patch for 2.4.19-pre7 that should help your fsync > operations a little. (From Chris Mason). > Filesystem cannot do very much at this point unfortunatelly, it is ending up > waiting for disk to finish write operations. > > Also we are working on other speedup patches that would cover different areas > of write perfomance itself. A newer one (against 2.4.19-pre7) is below. It has not been through as much testing on the namesys side, which is why Oleg sent the older one. Wayne and I have been talking in private mail, he's getting a bunch of beta patches later today (this speedup, data logging, updated barrier code). Along with instructions for testing. -chris # Veritas (Hugh Dickins supplied the patch) sent the bits in # fs/super.c that allow the FS to leave super->s_dirt set after a # write_super call. # diff -urN --exclude *.orig parent/fs/buffer.c comp/fs/buffer.c --- parent/fs/buffer.c Mon Apr 29 10:20:24 2002 +++ comp/fs/buffer.cMon Apr 29 10:20:22 2002 @@ -325,6 +325,8 @@ lock_super(sb); if (sb->s_dirt && sb->s_op && sb->s_op->write_super) sb->s_op->write_super(sb); + if (sb->s_op && sb->s_op->commit_super) + sb->s_op->commit_super(sb); unlock_super(sb); unlock_kernel(); @@ -344,7 +346,7 @@ lock_kernel(); sync_inodes(dev); DQUOT_SYNC(dev); - sync_supers(dev); + commit_supers(dev); unlock_kernel(); return sync_buffers(dev, 1); diff -urN --exclude *.orig parent/fs/reiserfs/bitmap.c comp/fs/reiserfs/bitmap.c --- parent/fs/reiserfs/bitmap.c Mon Apr 29 10:20:24 2002 +++ comp/fs/reiserfs/bitmap.c Mon Apr 29 10:20:19 2002 @@ -122,7 +122,6 @@ set_sb_free_blocks( rs, sb_free_blocks(rs) + 1 ); journal_mark_dirty (th, s, sbh); - s->s_dirt = 1; } void reiserfs_free_block (struct reiserfs_transaction_handle *th, @@ -433,7 +432,6 @@ /* update free block count in super block */ PUT_SB_FREE_BLOCKS( s, SB_FREE_BLOCKS(s) - init_amount_needed ); journal_mark_dirty (th, s, SB_BUFFER_WITH_SB (s)); - s->s_dirt = 1; return CARRY_ON; } diff -urN --exclude *.orig parent/fs/reiserfs/ibalance.c comp/fs/reiserfs/ibalance.c --- parent/fs/reiserfs/ibalance.c Mon Apr 29 10:20:24 2002 +++ comp/fs/reiserfs/ibalance.c Mon Apr 29 10:20:19 2002 @@ -632,7 +632,6 @@ /* use check_internal if new root is an internal node */ check_internal (new_root); /*&&*/ - tb->tb_sb->s_dirt = 1; /* do what is needed for buffer thrown from tree */ reiserfs_invalidate_buffer(tb, tbSh); @@ -950,7 +949,6 @@ PUT_SB_ROOT_BLOCK( tb->tb_sb, tbSh->b_blocknr ); PUT_SB_TREE_HEIGHT( tb->tb_sb, SB_TREE_HEIGHT(tb->tb_sb) + 1 ); do_balance_mark_sb_dirty (tb, tb->tb_sb->u.reiserfs_sb.s_sbh, 1); - tb->tb_sb->s_dirt = 1; } if ( tb->blknum[h] == 2 ) { diff -urN --exclude *.orig parent/fs/reiserfs/journal.c comp/fs/reiserfs/journal.c --- parent/fs/reiserfs/journal.cMon Apr 29 10:20:24 2002 +++ comp/fs/reiserfs/journal.c Mon Apr 29 10:20:21 2002 @@ -64,12 +64,15 @@ */ static int reiserfs_mounted_fs_count = 0 ; +static struct list_head kreiserfsd_supers = LIST_HEAD_INIT(kreiserfsd_supers); + /* wake this up when you add something to the commit thread task queue */ DECLARE_WAIT_QUEUE_HEAD(reiserfs_commit_thread_wait) ; /* wait on this if you need to be sure you task queue entries have been run */ static DECLARE_WAIT_QUEUE_HEAD(reiserfs_commit_thread_done) ; DECLARE_TASK_QUEUE(reiserfs_commit_thread_tq) ; +DECLARE_MUTEX(kreiserfsd_sem) ; #define JOURNAL_TRANS_HALF 1018 /* must be correct to keep the desc and commit structs at 4k */ @@ -576,17 +579,12 @@ /* lock the current transaction */ inline static void lock_journal(struct super_block *p_s_sb) { PROC_INFO_INC( p_s_sb, journal.lock_journal ); - while(atomic_read(&(SB_JOURNAL(p_s_sb)->j_wlock)) > 0) { -PROC_INFO_INC( p_s_sb, journal.lock_journal_wait ); -sleep_on(&(SB_JOURNAL(p_s_sb)->j_wait)) ; - } - atomic_set(&(SB_JOURNAL(p_s_sb)->j_wlock), 1) ; + down(&SB_JOURNAL(p_s_sb)->j_lock); } /* unlock the current transaction */ inline static void unlock_journal(struct super_block *p_s_sb) { - atomic_dec(&(SB_JOURNAL(p_s_sb)->j_wlock)) ; - wake_up(&(SB_JOURNAL(p_s_sb)->j_wait)) ; + up(&SB_JOURNAL(p_s_sb)->j_lock); } /* @@ -756,7 +754,6 @@ atomic_set(&(jl->j_commit_flushing), 0) ; wake_up(&(jl->j_commit_wait)) ; - s->s_dirt = 1 ; return 0 ; } @@ -1220,7 +12
RE: [reiserfs-list] fsync() Performance Issue
Agreed, it would be better to sync to disk after multiple files rather than serially; however, in the interest of not being concerned of a power outage during the process, one of the reason the disk cache is disabled, the choice was to fsync() each write. -Original Message- From: Chris Mason [mailto:[EMAIL PROTECTED]] Sent: Monday, April 29, 2002 12:46 PM To: [EMAIL PROTECTED] Cc: Russell Coker; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: [reiserfs-list] fsync() Performance Issue On Mon, 2002-04-29 at 12:32, Toby Dickenson wrote: > >One thing that has occurred to me (which has not been previously discussed as > >far as I recall) is the possibility for using sync() instead of fsync() if > >you can accumulate a number of files (and therefore replace many fsync()'s > >with one sync() ). > > I can see > > write to file A > write to file B > write to file C > sync > > might be faster than > > write to file A > fsync A > write to file B > fsync B > write to file C > fsync C Correct. > > but is it possible for it to be faster than > > write to file A > write to file B > write to file C > fsync A > fsync B > fsync C It depends on the rest of the system. sync() goes through the big lru list for the whole box, and fsync() goes through the private list for just that inode. If you've got other devices or files with dirty data, case C that you presented will always be the fastest. For general use, I like this one the best, it is what the journal code is optimized for. If files A, B, and C are the only dirty things on the whole box, a single sync() will be slightly better, mostly due to reduced cpu time. -chris
[reiserfs-list] fsync() Performance Issue
I'm wondering if anyone out there may have some suggestions on how to improve the performance of a system employing fsync(). I have to be able to guaranty that every write to my fileserver is on disk when the client has passed it to the server. Therefore, I have disabled write cache on the disk and issue an fsync() per file. I'm running 2.4.19-pre7, reiserfs 3.6.25, without additional patches. I have seen some discussions out here about various other "speed-up" patches and am wondering if I need to add these to 2.4.19-pre7? And what they are and where can I obtain said patches? Also, I'm wondering if there is another solution to syncing the data that is faster than fsync(). Testing, thusfar, has shown a large disparity between running with and without sync.Another idea is to explore another filesystem, but I'm not exactly excited by the other journaling filesystems out there at this time. All ideas will be greatly appreciated. Wayne EMC Corp Centera Engineering 4400 Computer Drive M/S F213 Westboro, MA01580 email: [EMAIL PROTECTED] voice: (508) 898-6564 pager: (888) 769-4578 (numeric) [EMAIL PROTECTED] (alpha) fax: (508) 898-6388 "One man can make a difference, and every man should try." - JFK <>