Re: [reiserfs-list] Question on mount with big volumes

2001-08-16 Thread Philippe Gramoulle


Hi,

Ragnar Kjørstad wrote:
 
 On Thu, Aug 16, 2001 at 06:44:56PM +0200, Philippe Gramoulle wrote:
  Has someone already created volumes above 1 terabytes ?
 
 Yes, of course :)

Great ! What hardware/ drivers did you use ?

 
 For now linux is using 32bit sector indexing. This will limit you to 2
 TB, or 1 if your drivers happen to be using signed integers and using
 negative numbers for error-codes. There is work beeing done to change
 this to 64 bit indexing, but it will probably not be in the standard
 kernel until 2.5.

We use DELL PERC3/QC (aka AMI MEGARAID) and i thought the megaraid
driver would handle 2To just fine. I'm waiting confirmation from DELL on
this issue.

Thanks,

Philippe.



Re: [reiserfs-list] Question on mount with big volumes

2001-08-16 Thread Ragnar Kjørstad

On Thu, Aug 16, 2001 at 10:02:09PM +0200, Philippe Gramoulle wrote:
 Ragnar Kjørstad wrote:
  
  On Thu, Aug 16, 2001 at 06:44:56PM +0200, Philippe Gramoulle wrote:
   Has someone already created volumes above 1 terabytes ?
  
  Yes, of course :)
 
 Great ! What hardware/ drivers did you use ?

Naturally we use Big Storage tRAIDS :)
(scsi - scsi raids)

In our testing we've used the new aic7xxx driver in the 2.4 kernel.
Works like a charm. :)


-- 
Ragnar Kjorstad
Big Storage



Re: [reiserfs-list] Question on mount with big volumes

2001-08-16 Thread Hans Reiser

The starvation occurs when some process sends large requests to the same scsi
controller as our journal replay which sends one block requests, and the one
block requests starve.  Raid-resync is one known instance where this happens. 
Edward's patch cures that instance.

Hans

Edward Shushkin wrote:
 
 Philippe Gramoulle wrote:
 
  Hi,
 
  We've setup a test system : Linux box with a PERC3/QC AMI RAID card
  (MegaRAID driver) with 3 diskshelves of  12x36Go. ( RAID5 , 2 spare
  disks on each shelf ,1 Terabyte total)
 
  First of all, there is some odd message at boot :
 
  megaraid: v1.15d (Release Date: Wed May 30 17:30:41 EDT 2001)
  megaraid: found 0x101e:0x1960:idx 0:bus 2:slot 0:func 0
  scsi2 : Found a MegaRAID controller at 0xf8902000, IRQ: 20
  megaraid: [1.57:3.13] detected 2 logical drives
  scsi2 : AMI MegaRAID 1.57 254 commands 16 targs 4 chans 40 luns
  Attached scsi disk sdb at scsi2, channel 4, id 0, lun 0
  Attached scsi disk sdc at scsi2, channel 4, id 0, lun 1
  SCSI device sdb: 318468096 512-byte hdwr sectors (163056 MB)
sdb: sdb1  ^^^
  SCSI device sdc: 2059595776 512-byte hdwr sectors (-44998 MB)
^^^
  Why does sdc is reporting -44998 MB ??
 
  Nevertheless, fdisk'ing /dev/sdc runs fine.
  mkreiserfs runs fine as well.
 
  Mounting the partition for the first time took 32 minutes.
  umounting and remounting the partition for the second time took 1 minute
  32s.
  Re-unmounting and re-remounting the partition took 32 minutes again .
 
  There were absolutely no operations done in between.
 
  What do you think takes so much time for the mount ?
 
  It looks like you got the worst case when the system tries to find
 valid transaction
 (when your fs is just created or you have fs that was non-cleanly
 unmounted)
 and reads all journal blocks during raid5-resync process that causes a
 large number
 of IO requests. If so, there can not be more then one journal request in
 the queue
 due to wait_on_buffer and this request can not be merged with the other
 journal requsts.
 Probably you want the attached patch against 2.4.7 that uses read ahead
 of 32 journal blocks
 instead bread(). We have tested it a bit - time of mount seems to be
 reduced..
 Please report about your results.
 Thanks,
 Edward.
 
 
  Aren't we hitting a 32 bits issue here ? Replacing 36Go disks with 73Go
  disks
  would give me a : unable to open /dev/sdc when trying to do the fdisk.
 
  Has someone already created volumes above 1 terabytes ?
 
  We're currently trying the same tests with ext2 but mke2fs takes a
  *long* time compared to mkreiserfs :o). I'll give you the results soon.
 
  Thanks,
 
  Philippe.
 
   
 --- linux-2.4.7/fs/reiserfs/journal.c   Mon Aug  6 15:29:31 2001
 +++ linux-2.4.7-new/fs/reiserfs/journal.c   Tue Aug 14 22:57:06 2001
 @@ -81,6 +81,7 @@
  DECLARE_TASK_QUEUE(reiserfs_commit_thread_tq) ;
 
  #define JOURNAL_TRANS_HALF 1018   /* must be correct to keep the desc and commit 
structs at 4k */
 +#define NBUF 32 /* read ahead */
 
  /* cnode stat bits.  Move these into reiserfs_fs.h */
 
 @@ -1597,7 +1598,9 @@
int replay_count = 0 ;
int continue_replay = 1 ;
int ret ;
 -
 +  int need_read_ahead = 1;
 +  int first_read_ahead = 0;
 +  struct buffer_head * log_blocks[NBUF];
cur_dblock = reiserfs_get_journal_block(p_s_sb) ;
printk(reiserfs: checking transaction log (device %s) ...\n,
kdevname(p_s_sb-s_dev)) ;
 @@ -1653,7 +1656,29 @@
** all the valid transactions, and pick out the oldest.
*/
while(continue_replay  cur_dblock  (reiserfs_get_journal_block(p_s_sb) + 
JOURNAL_BLOCK_COUNT)) {
 -d_bh = bread(p_s_sb-s_dev, cur_dblock, p_s_sb-s_blocksize) ;
 +if (need_read_ahead) {
 +  /* read ahead NBUF buffers */
 +  int i;
 +  first_read_ahead = cur_dblock;
 +  for (i = 0; i  NBUF; i++) {
 +   log_blocks [i] = getblk (p_s_sb-s_dev, first_read_ahead + i,
 +p_s_sb-s_blocksize);
 +   if (!log_blocks [i]) {
 + brelse_array (log_blocks, i);
 + return -1;
 +   }
 +  }
 +  ll_rw_block (READ, NBUF, log_blocks);
 +  for (i = 0; i  NBUF; i++) {
 +   wait_on_buffer (log_blocks [i]);
 +   if (!buffer_uptodate (log_blocks [i])) {
 + brelse_array (log_blocks, NBUF);
 + return -1;
 +   }
 +  }
 +  need_read_ahead = 0;
 +}
 +d_bh = log_blocks[cur_dblock - first_read_ahead];
  ret = journal_transaction_is_valid(p_s_sb, d_bh, oldest_invalid_trans_id, 
newest_mount_id) ;
  if (ret == 1) {
desc = (struct reiserfs_journal_desc *)d_bh-b_data ;
 @@ -1680,12 +1705,17 @@
   newest_mount_id to %d\n, le32_to_cpu(desc-j_mount_id));
}
cur_dblock += le32_to_cpu(desc-j_len) + 2 ;
 -}
 -else {
 +} else
 +