Re: [BUG] Linux 2.6.25-rc2 - Kernel Ooops while running dbench

2008-02-18 Thread Jeff Garzik
Two x86-64 boxes here lock up here on 2.6.25-rc2, shortly after boot. 
One running Fedora 8 + X (GNOME) and one a headless file server. 
configs and lspci attached.  Unable to capture any splatter so far.


Bisecting...


00:00.0 Host bridge: Intel Corporation 82955X Memory Controller Hub
00:01.0 PCI bridge: Intel Corporation 82955X PCI Express Root Port
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 
(rev 01)
00:1c.4 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express 
Port 5 (rev 01)
00:1c.5 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express 
Port 6 (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI 
Controller #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI 
Controller #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI 
Controller #3 (rev 01)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI 
Controller #4 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI 
Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface 
Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller 
(rev 01)
00:1f.2 SATA controller: Intel Corporation 82801GR/GH (ICH7 Family) SATA AHCI 
Controller (rev 01)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)
01:00.0 VGA compatible controller: nVidia Corporation NV44 [Quadro NVS 285] 
(rev a1)
04:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5751 Gigabit 
Ethernet PCI Express (rev 01)
05:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 Gigabit 
Ethernet (rev 15)
00:00.0 Host bridge: Intel Corporation 82975X Memory Controller Hub
00:01.0 PCI bridge: Intel Corporation 82975X PCI Express Root Port
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition 
Audio Controller (rev 01)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 
(rev 01)
00:1c.4 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express 
Port 5 (rev 01)
00:1c.5 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express 
Port 6 (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI 
Controller #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI 
Controller #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI 
Controller #3 (rev 01)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI 
Controller #4 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI 
Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GH (ICH7DH) LPC Interface Bridge 
(rev 01)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller 
(rev 01)
00:1f.2 SATA controller: Intel Corporation 82801GR/GH (ICH7 Family) SATA AHCI 
Controller (rev 01)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)
01:00.0 VGA compatible controller: ATI Technologies Inc R580 [Radeon X1900 XT] 
(Primary)
01:00.1 Display controller: ATI Technologies Inc R580 [Radeon X1900 XT] 
(Secondary)
02:00.0 Multimedia controller: Philips Semiconductors Unknown device 7162
04:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet 
Controller
05:02.0 Network controller: RaLink RT2561/RT61 802.11g PCI
05:04.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000 
Controller (PHY/Link)
05:05.0 RAID bus controller: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] 
Serial ATA Controller (rev 02)


pretzel.bz2
Description: application/bzip


core.bz2
Description: application/bzip


Re: [RFC] basic delayed allocation in VFS

2007-07-27 Thread Jeff Garzik

Alex Tomas wrote:

So without the ability to attach specific I/O completions to bios
or support for unwritten extents directly in __mpage_writepage,
there is no way XFS can use this generic delayed allocation code.


I didn't say generic, see Subject: :)


Well, it shouldn't even be in the VFS layer if it's only usable by one 
filesystem.


Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-26 Thread Jeff Garzik

Alex Tomas wrote:

Jeff Garzik wrote:

Is this based on Christoph's work?

Christoph, or some other XFS hacker, already did generic delalloc, 
modeled on the XFS delalloc code.


nope, this one is simple (something I'd prefer for ext4).


The XFS one is proven and the work was already completed.

What were the specific technical issues that made it unsuitable for ext4?

I would rather not reinvent the wheel, particularly if the reinvention 
is less capable than the existing work.


Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-26 Thread Jeff Garzik

Alex Tomas wrote:

Good day,

please review ...

thanks, Alex


basic delayed allocation in VFS:

 * block_prepare_write() can be passed special -get_block() which
   doesn't allocate blocks, but reserve them and mark bh delayed
 * a filesystem can use mpage_da_writepages() with other -get_block()
   which doesn't defer allocation. mpage_da_writepages() finds all
   non-allocated blocks and try to allocate them with minimal calls
   to -get_block(), then submit IO using __mpage_writepage()


Signed-off-by: Alex Tomas [EMAIL PROTECTED]


Is this based on Christoph's work?

Christoph, or some other XFS hacker, already did generic delalloc, 
modeled on the XFS delalloc code.


Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6][TAKE5] fallocate system call

2007-06-29 Thread Jeff Garzik

Theodore Tso wrote:

I don't think we have a problem here.  What we have now is fine, and


It's fine for ext4, but not the wider world.  This is a common problem 
created by parallel development when code dependencies exist.




In any case, the plan is to push all of the core bits into Linus tree
for 2.6.22 once it opens up, which should be Real Soon Now, it looks
like.


Presumably you mean 2.6.23.

Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6][TAKE5] fallocate system call

2007-06-28 Thread Jeff Garzik

Andrew Morton wrote:

b) We do what we normally don't do and reserve the syscall slots in mainline.


If everyone agrees it's going to happen... why not?

Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Heads up on sys_fallocate()

2007-03-01 Thread Jeff Garzik

Amit K. Arora wrote:

This is to give a heads up on few patches that we will be soon coming up
with. These patches implement a new system call sys_fallocate() and a
new inode operation fallocate, for persistent preallocation. The new
system call, as Andrew suggested, will look like:

  asmlinkage long sys_fallocate(int fd, loff_t offset, loff_t len);

As we are developing and testing the required patches, we decided to
post a preliminary patch and get inputs from the community to give it
a right direction and shape. First, a little description on the feature.
 
Persistent preallocation is a file system feature using which an

application (say, relational database servers) can explicitly
preallocate blocks to a particular file. This feature can be used to
reserve space for a file to get mainly the following benefits:
1 contiguity - less defragmentation and thus faster access speed, and
2 guarantee for a minimum space availibility (depending on how many
blocks were preallocated) for the file, even if the filesystem becomes
full.

XFS already has an implementation for this, using an ioctl interface. And,
ext4 is now coming up with this feature. In coming time we may see a few
more file systems implementing this. Thus, it makes sense to have a more
standard interface for this, like this new system call.

Here is the initial and incomplete version of the patch, which can be
used for the discussion, till we come up with a set of more complete
patches.

---
 arch/i386/kernel/syscall_table.S |1 +
 fs/ext4/file.c   |1 +
 fs/open.c|   18 ++
 include/asm-i386/unistd.h|3 ++-
 include/linux/fs.h   |1 +
 include/linux/syscalls.h |1 +
 6 files changed, 24 insertions(+), 1 deletion(-)


I certainly agree that we want something like this.

posix_fallocate() is the glibc interface we want to be compatible with 
(which your definition is, AFAICS).


Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How git affects kernel.org performance

2007-01-08 Thread Jeff Garzik

Theodore Tso wrote:

The fastest and probably most important thing to add is some readahead
smarts to directories --- both to the htree and non-htree cases.  If
you're using some kind of b-tree structure, such as XFS does for
directories, preallocation doesn't help you much.  Delayed allocation
can save you if your delayed allocator knows how to structure disk
blocks so that a btree-traversal is efficient, but I'm guessing the
biggest reason why we are losing is because we don't have sufficient
readahead.  This also has the advantage that it will help without
needing to doing a backup/restore to improve layout.



Something I just thought of:  ATA and SCSI hard disks do their own 
read-ahead.  Seeking all over the place to pick up bits of directory 
will hurt even more with the disk reading and throwing away data (albeit 
in its internal elevator and cache).


Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Ext3 online defrag

2006-10-25 Thread Jeff Garzik
On Wed, Oct 25, 2006 at 03:38:23PM +1000, David Chinner wrote:
 On Wed, Oct 25, 2006 at 12:48:44AM -0400, Jeff Garzik wrote:
  On Wed, Oct 25, 2006 at 02:27:53PM +1000, David Chinner wrote:
   But it a race that is _easily_ handled, and applications only need to
   implement one interface, not a different method for every
   filesystem that requires deeep filesystem knowledge.
   
   Besides, you still have to handle the case where the block you want
   has already been allocated because reading the metadata from
   userspace doesn't prevent the kernel from allocating the block you
   want before you ask for it...
  
  The race is easily handled either way, by having the block move fail
  when you tell the kernel the destination blocks.
 
 So why are you arguing that an interface is no good because it
 is fundamentally racy? ;)

My point was that it is silly to introduce obviously racy code into the
kernel, when -- inside the kernel -- it could be handled race-free.

If you accept a racy solution, you might as well do it outside the
kernel, where you get the same results, but without adding silliness and
bloat to the kernel.


  Every major filesystem has a libfoofs library that makes it trivial to
  read the metadata, so all you need to do is use an existing lib.
 
 IOWs, you are advocating that any application that wants to use this
 special allocation technique needs to link against every different
 filesystem library and it then needs to implement filesystem
 specific searches through their metadata?  Nobody in their right
 mind would ever want to use an interface like this.

Online defrag is OBVIOUSLY highly filesystem specific.  You have to link
against filesystem specific code somewhere, whether its inside the
kernel or outside the kernel.

Further, in the case being discussed in this thread, ext2meta has
already been proven a workable solution.

Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Ext3 online defrag

2006-10-25 Thread Jeff Garzik
On Wed, Oct 25, 2006 at 06:11:37PM +1000, David Chinner wrote:
 On Wed, Oct 25, 2006 at 02:01:42AM -0400, Jeff Garzik wrote:
  On Wed, Oct 25, 2006 at 03:38:23PM +1000, David Chinner wrote:
   On Wed, Oct 25, 2006 at 12:48:44AM -0400, Jeff Garzik wrote:
   So why are you arguing that an interface is no good because it
   is fundamentally racy? ;)
  
  My point was that it is silly to introduce obviously racy code into the
  kernel, when -- inside the kernel -- it could be handled race-free.
 
 So how do you then get the generic interface to allocate blocks
 specified by userspace race free?

As has been repeatedly stated, there is no generic.  There MUST be
filesystem-specific knowledge during these operations.


 If userspace directed allocation requires deep knowledge of the
 filesystem metadata (this is what you are saying they need to do,
 right?), then these applications will never, ever make use of this
 interface and we'll continue to have problems with them.

Completely false assumptions.  There is no difference in handling of
knowledge, be it kernel space or userspace.


  Further, in the case being discussed in this thread, ext2meta has
  already been proven a workable solution.
 
 Sure, but that's not a generic solution to a problem common to
 all filesystems

You clearly don't know what I'm talking about.  ext2meta is an example
of a filesystem-specific metadata access method, applicable to tasks
such as online optimization.

Implement that tiny kernel module for each filesystem, and you have
everything you need, without races.  This was discussed years ago;
review the mailing lists.  Google for 'Alexander Viro' and 'ext2meta'.

Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Ext3 online defrag

2006-10-25 Thread Jeff Garzik
On Wed, Oct 25, 2006 at 04:54:50PM +0200, Jan Kara wrote:
   Yes, this sounds feasible. We could split the defrag ioctl into two
 pieces (addition of given extent to a file and swapping of extents), which
 can have generic interface... 

An ioctl is UGLY.

This was discussed years ago.  Google for 'Alexander Viro' and
'ext2meta'.  That's a clean, flexible, extensible way to access metadata
online.  No need for ioctl binary translation across 32bit-64bit, or
any other ioctl issue.

Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Ext3 online defrag

2006-10-25 Thread Jeff Garzik
On Wed, Oct 25, 2006 at 08:25:30PM +0200, Jan Kara wrote:
   I see. So you mean that in our ext3meta filesystem we'd have a file
 named add_this_extent_to_inode and a file reloc_inode_interval and
 they'd be fed essentially the same info as the current ioctl interface and
 do the same thing as we currently do. Hmm, I don't find it that nice any
 more but yes, this would work.

It depends on the operation.  ext2meta[1] works fine for online
defrag, just exporting metadata objects and providing read(1)
and write(2) operations on them.  Adding 'trigger' files (like your
add_this_extent_to_inode) may make sense for some operations, indeed,
but we need to see the whole picture before really understanding
whether that interface is optimal.

Jeff


[1] http://linux.yyz.us/misc/ext2meta.c
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Ext3 online defrag

2006-10-25 Thread Jeff Garzik
On Wed, Oct 25, 2006 at 08:36:56PM +0200, Jan Kara wrote:
   Yes, but there's a question of the interface to this operation. How to
 specify which indirect block I mean? Obviously we could introduce
 separate call for remapping indirect blocks but I find this solution
 kind of clumsy...

Agreed...  that gets nasty real quick.

Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Ext3 online defrag

2006-10-24 Thread Jeff Garzik
On Wed, Oct 25, 2006 at 12:30:02PM +1000, Barry Naujok wrote:
 Could we have a more abstract method for asking the filesystem where the 
 free blocks are and then using the same block addressing to tell the
 fs where to allocate/move the file's data to?

That's fundamentally racy, so you might as well just read the
filesystem metadata from userspace.  No need to go through the kernel
for that.

Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Ext3 online defrag

2006-10-24 Thread Jeff Garzik
On Wed, Oct 25, 2006 at 02:27:53PM +1000, David Chinner wrote:
 But it a race that is _easily_ handled, and applications only need to
 implement one interface, not a different method for every
 filesystem that requires deeep filesystem knowledge.
 
 Besides, you still have to handle the case where the block you want
 has already been allocated because reading the metadata from
 userspace doesn't prevent the kernel from allocating the block you
 want before you ask for it...

The race is easily handled either way, by having the block move fail
when you tell the kernel the destination blocks.

The difference is that you don't unnecessarily bloat the kernel.

Every major filesystem has a libfoofs library that makes it trivial to
read the metadata, so all you need to do is use an existing lib.

Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Ext3 online defrag

2006-10-23 Thread Jeff Garzik
On Mon, Oct 23, 2006 at 06:31:40PM +0400, Alex Tomas wrote:
 isn't that a kernel responsbility to find/allocate target blocks?
 wouldn't it better to specify desirable target group and minimal
 acceptable chunk of free blocks?

The kernel doesn't have enough knowledge to know whether or not the
defragger prefers one blkdev location over another.

When you are trying to consolidate blocks, you must specify the
destination as well as source blocks.

Certainly, to prevent corruption and other nastiness, you must fail if
the destination isn't available...

(ext2meta did all this...)

Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html