Re: Major regression on hackbench with SLUB

2007-12-09 Thread Björn Steinbrink
On 2007.12.08 17:16:24 -0500, Steven Rostedt wrote:
> 
> Hi Linus,
> 
> > On Fri, 7 Dec 2007, Linus Torvalds wrote:
> > >
> > > Can you do one run with oprofile, and see exactly where the cost is? It
> > > should hopefully be pretty darn obvious, considering your timing.
> 
> The results are here:
> 
> http://people.redhat.com/srostedt/slub/results/slab.op
> http://people.redhat.com/srostedt/slub/results/slub.op

Hm, you seem to be hitting the "another_slab" stuff in __slab_alloc
alot. I wonder if !node_match triggers too often. We always start with
the per cpu slab, if that one is on the wrong node, you'll always hit
that "another_slab" path.

After searching for way too long (given that I have no clue about that
stuff anyway and just read the code out of curiousness), I noticed that
the the cpu_to_node stuff on x86_64 seems to be initialized to 0xff
(arch/x86/mm/numa_64.c), and Google brought me this dmesg output [1],
which, AFAICT, shows that the per cpu slab setup is done _before_
cpu_to_node is correctly setup. That would lead to the per cpu slabs all
having node == 0xff, which looks pretty bad.

Disclaimer: I read the slub/numa/$WHATEVER_I_SAW_THERE for the first
time, so this might be total bull ;-)

Björn

[1] http://linux.derkeiler.com/Mailing-Lists/Kernel/2007-10/msg04648.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] Fix use of skb after netif_rx

2007-12-09 Thread Wang Chen
Julia Lawall said the following on 2007-12-10 15:18:
>> Julia, seems that your semantic patch misses following place.
>>
>> drivers/s390/net/qeth_main.c:2733
>> ...
>> #endif
>>  rxrc = netif_rx(skb);
>>  card->dev->last_rx = jiffies;
>>  card->stats.rx_packets++;
>>  card->stats.rx_bytes += skb->len;
>> ...
> 
> Actually, I found this one as well, but I wasn't sure what to do with it.  
> This one is a bit more complicated because the line with the call to 
> netif_rx is in an else branch if the #ifdef above is taken.  So I wasn't 
> sure what would be the best way to solve the problem in this case.
> 
> Perhaps the solution would be just to save the value of the len field 
> in a local variable in this case, as you proposed in your original patch.
> 

I agree.

BTW, please send driver patch to Jeff Garzik <[EMAIL PROTECTED]> and
cc to [EMAIL PROTECTED]

--
WCN

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] Fix use of skb after netif_rx

2007-12-09 Thread Julia Lawall
> > // 
> > 
> > diff a/drivers/s390/net/ctcmain.c b/drivers/s390/net/ctcmain.c
> > diff a/drivers/s390/net/netiucv.c b/drivers/s390/net/netiucv.c
> 
> Julia, seems that your semantic patch misses following place.
> 
> drivers/s390/net/qeth_main.c:2733
> ...
> #endif
>   rxrc = netif_rx(skb);
>   card->dev->last_rx = jiffies;
>   card->stats.rx_packets++;
>   card->stats.rx_bytes += skb->len;
> ...

Actually, I found this one as well, but I wasn't sure what to do with it.  
This one is a bit more complicated because the line with the call to 
netif_rx is in an else branch if the #ifdef above is taken.  So I wasn't 
sure what would be the best way to solve the problem in this case.

Perhaps the solution would be just to save the value of the len field 
in a local variable in this case, as you proposed in your original patch.

julia

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vivi driver works only as first device

2007-12-09 Thread Mauro Carvalho Chehab
Hi Gregor,
Em Qui, 2007-12-06 às 23:06 +0100, Gregor Jasny escreveu:
> From: Gregor Jasny <[EMAIL PROTECTED]>
> 
> When the vivi driver allocates a video device, video_register_device() stores 
> the
> allocated device minor inside the vivi structure. But when the device node is 
> opened,
> the file minor number is compared to the minor in the device list. So this 
> patch
> copies the allocated minor in the device list, too.

Thanks for the report. Instead of applying your patch, I decided to
better analyze the issue, fixing it with the proper solution. The issue
is that vivi_register changes iminor, but this change were not properly
returned to the driver.

-- 
Cheers,
Mauro

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] pci: Fix warning in setup-res.c on 32-bit platforms with 64-bit resources

2007-12-09 Thread Benjamin Herrenschmidt
This adds appropriate casts to avoid a warning and print the correct
values in pr_debug.

Signed-off-by: Benjamin Herrenschmidt <[EMAIL PROTECTED]>
---

 drivers/pci/setup-res.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

Index: linux-work/drivers/pci/setup-res.c
===
--- linux-work.orig/drivers/pci/setup-res.c 2007-12-10 17:17:32.0 
+1100
+++ linux-work/drivers/pci/setup-res.c  2007-12-10 17:17:54.0 +1100
@@ -51,10 +51,12 @@ pci_update_resource(struct pci_dev *dev,
 
pcibios_resource_to_bus(dev, , res);
 
-   pr_debug("  got res [%llx:%llx] bus [%lx:%lx] flags %lx for "
+   pr_debug("  got res [%llx:%llx] bus [%llx:%llx] flags %lx for "
 "BAR %d of %s\n", (unsigned long long)res->start,
 (unsigned long long)res->end,
-region.start, region.end, res->flags, resno, pci_name(dev));
+(unsigned long long)region.start,
+(unsigned long long)region.end,
+(unsigned long)res->flags, resno, pci_name(dev));
 
new = region.start | (res->flags & PCI_REGION_FLAG_MASK);
if (res->flags & IORESOURCE_IO)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] pci: Fix bus resource assignment on 32 bits with 64b resources

2007-12-09 Thread Benjamin Herrenschmidt
The current pci_assign_unassigned_resources() code doesn't work properly
on 32 bits platforms with 64 bits resources. The main reason is the use
of unsigned long in various places instead of resource_size_t.

This fixes it, along with some tricks to avoid casting to 64 bits on
platforms that don't need it in every printk around.

This is a pre-requisite for making powerpc use the generic code instead of
its own half-useful implementation.

Signed-off-by: Benjamin Herrenschmidt <[EMAIL PROTECTED]>
---

This version now uses casts as Greg asked for and adds proper setup
of the prefetchable base & limit "upper" registers when using 64 bits
resources.

(and builds ... sorry about that)

 drivers/pci/setup-bus.c |   64 ++--
 include/linux/pci.h |4 +--
 2 files changed, 42 insertions(+), 26 deletions(-)

Index: linux-work/drivers/pci/setup-bus.c
===
--- linux-work.orig/drivers/pci/setup-bus.c 2007-12-10 17:16:44.0 
+1100
+++ linux-work/drivers/pci/setup-bus.c  2007-12-10 17:21:53.0 +1100
@@ -89,8 +89,9 @@ void pci_setup_cardbus(struct pci_bus *b
 * The IO resource is allocated a range twice as large as it
 * would normally need.  This allows us to set both IO regs.
 */
-   printk("  IO window: %08lx-%08lx\n",
-   region.start, region.end);
+   printk(KERN_INFO "  IO window: 0x%08lx-0x%08lx\n",
+  (unsigned long)region.start,
+  (unsigned long)region.end);
pci_write_config_dword(bridge, PCI_CB_IO_BASE_0,
region.start);
pci_write_config_dword(bridge, PCI_CB_IO_LIMIT_0,
@@ -99,8 +100,9 @@ void pci_setup_cardbus(struct pci_bus *b
 
pcibios_resource_to_bus(bridge, , bus->resource[1]);
if (bus->resource[1]->flags & IORESOURCE_IO) {
-   printk("  IO window: %08lx-%08lx\n",
-   region.start, region.end);
+   printk(KERN_INFO "  IO window: 0x%08lx-0x%08lx\n",
+  (unsigned long)region.start,
+  (unsigned long)region.end);
pci_write_config_dword(bridge, PCI_CB_IO_BASE_1,
region.start);
pci_write_config_dword(bridge, PCI_CB_IO_LIMIT_1,
@@ -109,8 +111,9 @@ void pci_setup_cardbus(struct pci_bus *b
 
pcibios_resource_to_bus(bridge, , bus->resource[2]);
if (bus->resource[2]->flags & IORESOURCE_MEM) {
-   printk("  PREFETCH window: %08lx-%08lx\n",
-   region.start, region.end);
+   printk(KERN_INFO "  PREFETCH window: 0x%08lx-0x%08lx\n",
+  (unsigned long)region.start,
+  (unsigned long)region.end);
pci_write_config_dword(bridge, PCI_CB_MEMORY_BASE_0,
region.start);
pci_write_config_dword(bridge, PCI_CB_MEMORY_LIMIT_0,
@@ -119,8 +122,9 @@ void pci_setup_cardbus(struct pci_bus *b
 
pcibios_resource_to_bus(bridge, , bus->resource[3]);
if (bus->resource[3]->flags & IORESOURCE_MEM) {
-   printk("  MEM window: %08lx-%08lx\n",
-   region.start, region.end);
+   printk(KERN_INFO "  MEM window: 0x%08lx-0x%08lx\n",
+  (unsigned long)region.start,
+  (unsigned long)region.end);
pci_write_config_dword(bridge, PCI_CB_MEMORY_BASE_1,
region.start);
pci_write_config_dword(bridge, PCI_CB_MEMORY_LIMIT_1,
@@ -145,7 +149,7 @@ pci_setup_bridge(struct pci_bus *bus)
 {
struct pci_dev *bridge = bus->self;
struct pci_bus_region region;
-   u32 l, io_upper16;
+   u32 l, bu, lu, io_upper16;
 
DBG(KERN_INFO "PCI: Bridge: %s\n", pci_name(bridge));
 
@@ -159,7 +163,8 @@ pci_setup_bridge(struct pci_bus *bus)
/* Set up upper 16 bits of I/O base/limit. */
io_upper16 = (region.end & 0x) | (region.start >> 16);
DBG(KERN_INFO "  IO window: %04lx-%04lx\n",
-   region.start, region.end);
+   (unsigned long)region.start,
+   (unsigned long)region.end);
}
else {
/* Clear upper 16 bits of I/O base/limit. */
@@ -180,8 +185,9 @@ pci_setup_bridge(struct pci_bus *bus)
if (bus->resource[1]->flags & IORESOURCE_MEM) {
l = (region.start >> 16) & 0xfff0;
l |= region.end & 0xfff0;
-   DBG(KERN_INFO "  MEM window: %08lx-%08lx\n",
-   region.start, region.end);
+   DBG(KERN_INFO "  MEM window: 0x%08lx-0x%08lx\n",
+   (unsigned long)region.start,
+   

[GIT PULL] XFS update for 2.6.24-rc5

2007-12-09 Thread Lachlan McIlroy
Please pull from the for-linus branch:
git pull git://oss.sgi.com:8090/xfs/xfs-2.6.git for-linus

This will update the following files:

 fs/xfs/linux-2.6/xfs_buf.c |   37 +---
 fs/xfs/linux-2.6/xfs_file.c|  124 
 fs/xfs/linux-2.6/xfs_ioctl.c   |   20 +++
 fs/xfs/linux-2.6/xfs_ioctl32.c |3 +
 fs/xfs/linux-2.6/xfs_iops.c|4 +-
 fs/xfs/quota/xfs_qm.c  |3 +
 fs/xfs/xfs_iget.c  |2 +-
 fs/xfs/xfs_itable.c|   43 +-
 8 files changed, 186 insertions(+), 50 deletions(-)

through these commits:

commit cf10e82bdc0d38d09dfaf46d0daf56136138ef3f
Author: David Chinner <[EMAIL PROTECTED]>
Date:   Fri Dec 7 14:09:11 2007 +1100

[XFS] Fix xfs_ichgtime()s broken usage of I_SYNC

The recent I_LOCK->I_SYNC changes mistakenly changed xfs_ichgtime to look
at I_SYNC instead of I_LOCK. This was incorrect and prevents newly created
inodes from moving to the dirty list. Change this to the correct check
which is for I_NEW, not I_LOCK or I_SYNC so that behaviour is correct.

SGI-PV: 974225
SGI-Modid: xfs-linux-melb:xfs-kern:30204a

Signed-off-by: David Chinner <[EMAIL PROTECTED]>
Signed-off-by: Lachlan McIlroy <[EMAIL PROTECTED]>

commit 978c7b2ff49597ab76ff7529a933bd366941ac25
Author: Rafael J. Wysocki <[EMAIL PROTECTED]>
Date:   Fri Dec 7 14:09:02 2007 +1100

[XFS] Make xfsbufd threads freezable

Fix breakage caused by commit 831441862956fffa17b9801db37e6ea1650b0f69
that did not introduce the necessary call to set_freezable() in
xfs/linux-2.6/xfs_buf.c .

SGI-PV: 974224
SGI-Modid: xfs-linux-melb:xfs-kern:30203a

Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
Signed-off-by: David Chinner <[EMAIL PROTECTED]>
Signed-off-by: Lachlan McIlroy <[EMAIL PROTECTED]>

commit e89bc612d61edbcefaeb6f2244f86c0f3ec89d23
Author: Christoph Hellwig <[EMAIL PROTECTED]>
Date:   Fri Dec 7 14:07:53 2007 +1100

[XFS] revert to double-buffering readdir

The current readdir implementation deadlocks on a btree buffers locks
because nfsd calls back into ->lookup from the filldir callback. The only
short-term fix for this is to revert to the old inefficient
double-buffering scheme.

SGI-PV: 973377
SGI-Modid: xfs-linux-melb:xfs-kern:30201a

Signed-off-by: Christoph Hellwig <[EMAIL PROTECTED]>
Signed-off-by: David Chinner <[EMAIL PROTECTED]>
Signed-off-by: Lachlan McIlroy <[EMAIL PROTECTED]>

commit a7430847fcb19297d6db833f35b9c9645c4a6395
Author: David Chinner <[EMAIL PROTECTED]>
Date:   Fri Nov 23 16:30:23 2007 +1100

[XFS] Fix broken inode cluster setup.

The radix tree based inode caches did away with the inode cluster hashes,
replacing them with a bunch of masking and gang lookups on the radix tree.

This masking got broken when moving the code to per-ag radix trees and
indexing by agino # rather than straight inode number. The result is
clustered inode writeback does not cluster and things can go extremely
slowly when there are lots of inodes to write.

Fix it up by comparing the agino # of the inode we just looked up to the
index of the cluster we are looking for.

Tested-by: Torsten Kaiser <[EMAIL PROTECTED]>

SGI-PV: 972915
SGI-Modid: xfs-linux-melb:xfs-kern:30033a

Signed-off-by: David Chinner <[EMAIL PROTECTED]>
Signed-off-by: Lachlan McIlroy <[EMAIL PROTECTED]>

commit 77be55a5a13d9c7ddf780a93861f2fba33f8be1a
Author: Lachlan McIlroy <[EMAIL PROTECTED]>
Date:   Fri Nov 23 16:31:00 2007 +1100

[XFS] Clear XBF_READ_AHEAD flag on I/O completion.

SGI-PV: 972554
SGI-Modid: xfs-linux-melb:xfs-kern:30128a

Signed-off-by: Lachlan McIlroy <[EMAIL PROTECTED]>
Signed-off-by: Christoph Hellwig <[EMAIL PROTECTED]>

commit d1afb678ce77b930334a8a640a05b8e68178a377
Author: Lachlan McIlroy <[EMAIL PROTECTED]>
Date:   Tue Nov 27 17:01:24 2007 +1100

[XFS] Fixed a few bugs in xfs_buf_associate_memory()

- calculation of 'page_count' was incorrect as it did not
  consider the offset of 'mem' into the first page. The
  logic to bump 'page_count' didn't work if 'len' was <=
  PAGE_CACHE_SIZE (ie offset = 3k, len = 2k).
- setting b_buffer_length to 'len' is incorrect if 'offset'
  is > 0. Set it to the total length of the buffer.
- I suspect that passing a non-aligned address into
  mem_to_page() for the first page may have been causing
  issues - don't know but just tidy up that code anyway.

SGI-PV: 971596
SGI-Modid: xfs-linux-melb:xfs-kern:30143a

Signed-off-by: Lachlan McIlroy <[EMAIL PROTECTED]>
Signed-off-by: Christoph Hellwig <[EMAIL PROTECTED]>

commit cd57e594adc624dd9ee4c0ded3949da21ec24b2f
Author: Lachlan McIlroy <[EMAIL PROTECTED]>
Date:   Fri Nov 23 16:30:32 2007 +1100

[XFS] 971064 Various fixups for 

Re: [PATCH] pci: Fix bus resource assignment on 32 bits with 64b resources

2007-12-09 Thread Benjamin Herrenschmidt

On Mon, 2007-12-10 at 17:15 +1100, Benjamin Herrenschmidt wrote:
> The current pci_assign_unassigned_resources() code doesn't work properly
> on 32 bits platforms with 64 bits resources. The main reason is the use
> of unsigned long in various places instead of resource_size_t.
> 
> This fixes it, along with some tricks to avoid casting to 64 bits on
> platforms that don't need it in every printk around.
> 
> This is a pre-requisite for making powerpc use the generic code instead of
> its own half-useful implementation.
> 
> Signed-off-by: Benjamin Herrenschmidt <[EMAIL PROTECTED]>
> ---
> 
> This version now uses casts as Greg asked for and adds proper setup
> of the prefetchable base & limit "upper" registers when using 64 bits
> resources.

Crap ! Ignore it. I forgot to quilt ref again... won't build.

Sending it again, with a separate patch fixing a warning in setup-res.c
that isn't directly related to the changes to setup-bus.c

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] pci: Fix bus resource assignment on 32 bits with 64b resources

2007-12-09 Thread Benjamin Herrenschmidt
The current pci_assign_unassigned_resources() code doesn't work properly
on 32 bits platforms with 64 bits resources. The main reason is the use
of unsigned long in various places instead of resource_size_t.

This fixes it, along with some tricks to avoid casting to 64 bits on
platforms that don't need it in every printk around.

This is a pre-requisite for making powerpc use the generic code instead of
its own half-useful implementation.

Signed-off-by: Benjamin Herrenschmidt <[EMAIL PROTECTED]>
---

This version now uses casts as Greg asked for and adds proper setup
of the prefetchable base & limit "upper" registers when using 64 bits
resources.

 drivers/pci/setup-bus.c |   59 +---
 include/linux/pci.h |4 +--
 2 files changed, 38 insertions(+), 25 deletions(-)

Index: linux-work/drivers/pci/setup-bus.c
===
--- linux-work.orig/drivers/pci/setup-bus.c 2007-12-10 16:56:51.0 
+1100
+++ linux-work/drivers/pci/setup-bus.c  2007-12-10 17:12:27.0 +1100
@@ -89,8 +89,9 @@ void pci_setup_cardbus(struct pci_bus *b
 * The IO resource is allocated a range twice as large as it
 * would normally need.  This allows us to set both IO regs.
 */
-   printk("  IO window: %08lx-%08lx\n",
-   region.start, region.end);
+   printk(KERN_INFO "  IO window: 0x%08lx-0x%08lx\n",
+  (unsigned long)region.start,
+  (unsigned long)region.end);
pci_write_config_dword(bridge, PCI_CB_IO_BASE_0,
region.start);
pci_write_config_dword(bridge, PCI_CB_IO_LIMIT_0,
@@ -99,8 +100,9 @@ void pci_setup_cardbus(struct pci_bus *b
 
pcibios_resource_to_bus(bridge, , bus->resource[1]);
if (bus->resource[1]->flags & IORESOURCE_IO) {
-   printk("  IO window: %08lx-%08lx\n",
-   region.start, region.end);
+   printk(KERN_INFO "  IO window: 0x%08lx-0x%08lx\n",
+  (unsigned long)region.start,
+  (unsigned long)region.end);
pci_write_config_dword(bridge, PCI_CB_IO_BASE_1,
region.start);
pci_write_config_dword(bridge, PCI_CB_IO_LIMIT_1,
@@ -109,8 +111,9 @@ void pci_setup_cardbus(struct pci_bus *b
 
pcibios_resource_to_bus(bridge, , bus->resource[2]);
if (bus->resource[2]->flags & IORESOURCE_MEM) {
-   printk("  PREFETCH window: %08lx-%08lx\n",
-   region.start, region.end);
+   printk(KERN_INFO "  PREFETCH window: 0x%08lx-0x%08lx\n",
+  (unsigned long)region.start,
+  (unsigned long)region.end);
pci_write_config_dword(bridge, PCI_CB_MEMORY_BASE_0,
region.start);
pci_write_config_dword(bridge, PCI_CB_MEMORY_LIMIT_0,
@@ -119,8 +122,9 @@ void pci_setup_cardbus(struct pci_bus *b
 
pcibios_resource_to_bus(bridge, , bus->resource[3]);
if (bus->resource[3]->flags & IORESOURCE_MEM) {
-   printk("  MEM window: %08lx-%08lx\n",
-   region.start, region.end);
+   printk(KERN_INFO "  MEM window: 0x%08lx-0x%08lx\n",
+  (unsigned long)region.start,
+  (unsigned long)region.end);
pci_write_config_dword(bridge, PCI_CB_MEMORY_BASE_1,
region.start);
pci_write_config_dword(bridge, PCI_CB_MEMORY_LIMIT_1,
@@ -145,7 +149,7 @@ pci_setup_bridge(struct pci_bus *bus)
 {
struct pci_dev *bridge = bus->self;
struct pci_bus_region region;
-   u32 l, io_upper16;
+   u32 l, bu, lu, io_upper16;
 
DBG(KERN_INFO "PCI: Bridge: %s\n", pci_name(bridge));
 
@@ -159,7 +163,8 @@ pci_setup_bridge(struct pci_bus *bus)
/* Set up upper 16 bits of I/O base/limit. */
io_upper16 = (region.end & 0x) | (region.start >> 16);
DBG(KERN_INFO "  IO window: %04lx-%04lx\n",
-   region.start, region.end);
+   (unsigned long)region.start,
+   (unsigned long)region.end);
}
else {
/* Clear upper 16 bits of I/O base/limit. */
@@ -180,8 +185,9 @@ pci_setup_bridge(struct pci_bus *bus)
if (bus->resource[1]->flags & IORESOURCE_MEM) {
l = (region.start >> 16) & 0xfff0;
l |= region.end & 0xfff0;
-   DBG(KERN_INFO "  MEM window: %08lx-%08lx\n",
-   region.start, region.end);
+   DBG(KERN_INFO "  MEM window: 0x%08lx-0x%08lx\n",
+   (unsigned long)region.start,
+   (unsigned 

[PATCH][SCSI] hptiop: add more adapter models and other fixes

2007-12-09 Thread HighPoint Linux Team
Most code changes were made to support adapters based on Marvell IOP, plus some
other fixes.

- add more PCI device IDs
- support for adapters based on Marvell IOP
- fix a result code translation error on big-endian systems
- fix resource releasing bug when scsi_host_alloc() fail in hptiop_probe()
- update scsi_cmnd.resid when finishing a request
- correct some coding style issues

Signed-off-by: HighPoint Linux Team <[EMAIL PROTECTED]>
---

 Documentation/scsi/hptiop.txt |   30 ++-
 drivers/scsi/Kconfig  |4 +-
 drivers/scsi/hptiop.c |  589 -
 drivers/scsi/hptiop.h |  101 ++--
 4 files changed, 568 insertions(+), 156 deletions(-)

diff --git a/Documentation/scsi/hptiop.txt b/Documentation/scsi/hptiop.txt
index d28a312..a6eb4ad 100644
--- a/Documentation/scsi/hptiop.txt
+++ b/Documentation/scsi/hptiop.txt
@@ -1,9 +1,9 @@
-HIGHPOINT ROCKETRAID 3xxx RAID DRIVER (hptiop)
+HIGHPOINT ROCKETRAID 3xxx/4xxx ADAPTER DRIVER (hptiop)
 
 Controller Register Map
 -
 
-The controller IOP is accessed via PCI BAR0.
+For Intel IOP based adapters, the controller IOP is accessed via PCI BAR0:
 
  BAR0 offsetRegister
 0x10Inbound Message Register 0
@@ -18,6 +18,24 @@ The controller IOP is accessed via PCI BAR0.
 0x40Inbound Queue Port
 0x44Outbound Queue Port
 
+For Marvell IOP based adapters, the IOP is accessed via PCI BAR0 and BAR1:
+
+ BAR0 offsetRegister
+ 0x20400Inbound Doorbell Register
+ 0x20404Inbound Interrupt Mask Register
+ 0x20408Outbound Doorbell Register
+ 0x2040COutbound Interrupt Mask Register
+
+ BAR1 offsetRegister
+ 0x0Inbound Queue Head Pointer
+ 0x4Inbound Queue Tail Pointer
+ 0x8Outbound Queue Head Pointer
+ 0xCOutbound Queue Tail Pointer
+0x10Inbound Message Register
+0x14Outbound Message Register
+ 0x40-0x1040Inbound Queue
+   0x1040-0x2040Outbound Queue
+
 
 I/O Request Workflow
 --
@@ -73,15 +91,9 @@ The driver exposes following sysfs attributes:
  driver-versionR driver version string
  firmware-version  R firmware version string
 
-The driver registers char device "hptiop" to communicate with HighPoint RAID
-management software. Its ioctl routine acts as a general binary interface 
-between the IOP firmware and HighPoint RAID management software. New management
-functions can be implemented in application/firmware without modification
-in driver code.
-
 
 -
-Copyright (C) 2006 HighPoint Technologies, Inc. All Rights Reserved.
+Copyright (C) 2006-2007 HighPoint Technologies, Inc. All Rights Reserved.
 
   This file is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index 86cf10e..ba118ba 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -573,10 +573,10 @@ config SCSI_ARCMSR_AER
 source "drivers/scsi/megaraid/Kconfig.megaraid"
 
 config SCSI_HPTIOP
-   tristate "HighPoint RocketRAID 3xxx Controller support"
+   tristate "HighPoint RocketRAID 3xxx/4xxx Controller support"
depends on SCSI && PCI
help
- This option enables support for HighPoint RocketRAID 3xxx
+ This option enables support for HighPoint RocketRAID 3xxx/4xxx
  controllers.
 
  To compile this driver as a module, choose M here; the module
diff --git a/drivers/scsi/hptiop.c b/drivers/scsi/hptiop.c
index 0844331..7febfd5 100644
--- a/drivers/scsi/hptiop.c
+++ b/drivers/scsi/hptiop.c
@@ -1,5 +1,5 @@
 /*
- * HighPoint RR3xxx controller driver for Linux
+ * HighPoint RR3xxx/4xxx controller driver for Linux
  * Copyright (C) 2006-2007 HighPoint Technologies, Inc. All Rights Reserved.
  *
  * This program is free software; you can redistribute it and/or modify
@@ -38,80 +38,84 @@
 #include "hptiop.h"
 
 MODULE_AUTHOR("HighPoint Technologies, Inc.");
-MODULE_DESCRIPTION("HighPoint RocketRAID 3xxx SATA Controller Driver");
+MODULE_DESCRIPTION("HighPoint RocketRAID 3xxx/4xxx Controller Driver");
 
 static char driver_name[] = "hptiop";
-static const char driver_name_long[] = "RocketRAID 3xxx SATA Controller 
driver";
-static const char driver_ver[] = "v1.2 (070830)";
-
-static void hptiop_host_request_callback(struct hptiop_hba *hba, u32 tag);
-static void hptiop_iop_request_callback(struct hptiop_hba *hba, u32 tag);
+static const char driver_name_long[] = "RocketRAID 3xxx/4xxx Controller 
driver";
+static const char driver_ver[] = "v1.3 (071203)";
+
+static int iop_send_sync_msg(struct hptiop_hba *hba, u32 msg, u32 millisec);
+static void hptiop_finish_scsi_req(struct hptiop_hba *hba, u32 tag,
+   

Re: [PATCH] ITIMER_REAL: convert to use struct pid

2007-12-09 Thread Thomas Gleixner


On Fri, 7 Dec 2007, Oleg Nesterov wrote:

> signal_struct->tsk points to the ->group_leader and thus we have the nasty 
> code
> in de_thread() which has to change it and restart ->real_timer if the leader 
> is
> changed.
> 
> Use "struct pid *leader_pid" instead. This also allows us to kill now unneeded
> send_group_sig_info().
> 
> Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>

Nice cleanup.

Acked-by: Thomas Gleixner <[EMAIL PROTECTED]>

 
>  include/linux/sched.h |3 +--
>  kernel/fork.c |2 +-
>  kernel/itimer.c   |2 +-
>  fs/exec.c |   22 ++
>  kernel/signal.c   |   14 --
>  5 files changed, 5 insertions(+), 38 deletions(-)
> 
> --- PT/include/linux/sched.h~3_itimer_tsk_pid 2007-12-02 14:53:29.0 
> +0300
> +++ PT/include/linux/sched.h  2007-12-07 19:24:44.0 +0300
> @@ -444,7 +444,7 @@ struct signal_struct {
>  
>   /* ITIMER_REAL timer for the process */
>   struct hrtimer real_timer;
> - struct task_struct *tsk;
> + struct pid *leader_pid;
>   ktime_t it_real_incr;
>  
>   /* ITIMER_PROF and ITIMER_VIRTUAL timers for the process */
> @@ -1630,7 +1630,6 @@ extern void block_all_signals(int (*noti
>  extern void unblock_all_signals(void);
>  extern void release_task(struct task_struct * p);
>  extern int send_sig_info(int, struct siginfo *, struct task_struct *);
> -extern int send_group_sig_info(int, struct siginfo *, struct task_struct *);
>  extern int force_sigsegv(int, struct task_struct *);
>  extern int force_sig_info(int, struct siginfo *, struct task_struct *);
>  extern int __kill_pgrp_info(int sig, struct siginfo *info, struct pid *pgrp);
> --- PT/kernel/fork.c~3_itimer_tsk_pid 2007-12-07 19:03:34.0 +0300
> +++ PT/kernel/fork.c  2007-12-07 19:08:57.0 +0300
> @@ -882,7 +882,6 @@ static int copy_signal(unsigned long clo
>   hrtimer_init(>real_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
>   sig->it_real_incr.tv64 = 0;
>   sig->real_timer.function = it_real_fn;
> - sig->tsk = tsk;
>  
>   sig->it_virt_expires = cputime_zero;
>   sig->it_virt_incr = cputime_zero;
> @@ -1300,6 +1299,7 @@ static struct task_struct *copy_process(
>   if (clone_flags & CLONE_NEWPID)
>   p->nsproxy->pid_ns->child_reaper = p;
>  
> + p->signal->leader_pid = pid;
>   p->signal->tty = current->signal->tty;
>   set_task_pgrp(p, task_pgrp_nr(current));
>   set_task_session(p, task_session_nr(current));
> --- PT/kernel/itimer.c~3_itimer_tsk_pid   2007-10-25 16:22:12.0 
> +0400
> +++ PT/kernel/itimer.c2007-12-07 19:15:27.0 +0300
> @@ -132,7 +132,7 @@ enum hrtimer_restart it_real_fn(struct h
>   struct signal_struct *sig =
>   container_of(timer, struct signal_struct, real_timer);
>  
> - send_group_sig_info(SIGALRM, SEND_SIG_PRIV, sig->tsk);
> + kill_pid_info(SIGALRM, SEND_SIG_PRIV, sig->leader_pid);
>  
>   return HRTIMER_NORESTART;
>  }
> --- PT/fs/exec.c~3_itimer_tsk_pid 2007-12-02 16:07:22.0 +0300
> +++ PT/fs/exec.c  2007-12-07 19:18:48.0 +0300
> @@ -781,26 +781,8 @@ static int de_thread(struct task_struct 
>   zap_other_threads(tsk);
>   read_unlock(_lock);
>  
> - /*
> -  * Account for the thread group leader hanging around:
> -  */
> - count = 1;
> - if (!thread_group_leader(tsk)) {
> - count = 2;
> - /*
> -  * The SIGALRM timer survives the exec, but needs to point
> -  * at us as the new group leader now.  We have a race with
> -  * a timer firing now getting the old leader, so we need to
> -  * synchronize with any firing (by calling del_timer_sync)
> -  * before we can safely let the old group leader die.
> -  */
> - sig->tsk = tsk;
> - spin_unlock_irq(lock);
> - if (hrtimer_cancel(>real_timer))
> - hrtimer_restart(>real_timer);
> - spin_lock_irq(lock);
> - }
> -
> + /* Account for the thread group leader hanging around: */
> + count = thread_group_leader(tsk) ? 1 : 2;
>   sig->notify_count = count;
>   while (atomic_read(>count) > count) {
>   __set_current_state(TASK_UNINTERRUPTIBLE);
> --- PT/kernel/signal.c~3_itimer_tsk_pid   2007-12-07 17:20:27.0 
> +0300
> +++ PT/kernel/signal.c2007-12-07 19:23:40.0 +0300
> @@ -1204,20 +1204,6 @@ send_sig(int sig, struct task_struct *p,
>   return send_sig_info(sig, __si_special(priv), p);
>  }
>  
> -/*
> - * This is the entry point for "process-wide" signals.
> - * They will go to an appropriate thread in the thread group.
> - */
> -int
> -send_group_sig_info(int sig, struct siginfo *info, struct task_struct *p)
> -{
> - int ret;
> - read_lock(_lock);
> -  

Re: sparsemem: Make SPARSEMEM_VMEMMAP selectable

2007-12-09 Thread Yasunori Goto
Looks good to me.

Thanks.

Acked-by: Yasunori Goto <[EMAIL PROTECTED]>


> 
> From: Geoff Levand <[EMAIL PROTECTED]>
> 
> SPARSEMEM_VMEMMAP needs to be a selectable config option to
> support building the kernel both with and without sparsemem
> vmemmap support.  This selection is desirable for platforms
> which could be configured one way for platform specific
> builds and the other for multi-platform builds.
> 
> Signed-off-by: Miguel Boton <[EMAIL PROTECTED]>
> Signed-off-by: Geoff Levand <[EMAIL PROTECTED]>
> ---
> 
> Andrew, 
> 
> Please consider for 2.6.24.
> 
> -Geoff
> 
> 
>  mm/Kconfig |   15 +++
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -112,18 +112,17 @@ config SPARSEMEM_EXTREME
>   def_bool y
>   depends on SPARSEMEM && !SPARSEMEM_STATIC
>  
> -#
> -# SPARSEMEM_VMEMMAP uses a virtually mapped mem_map to optimise pfn_to_page
> -# and page_to_pfn.  The most efficient option where kernel virtual space is
> -# not under pressure.
> -#
>  config SPARSEMEM_VMEMMAP_ENABLE
>   def_bool n
>  
>  config SPARSEMEM_VMEMMAP
> - bool
> - depends on SPARSEMEM
> - default y if (SPARSEMEM_VMEMMAP_ENABLE)
> + bool "Sparse Memory virtual memmap"
> + depends on SPARSEMEM && SPARSEMEM_VMEMMAP_ENABLE
> + default y
> + help
> +  SPARSEMEM_VMEMMAP uses a virtually mapped memmap to optimise
> +  pfn_to_page and page_to_pfn operations.  This is the most
> +  efficient option when sufficient kernel resources are available.
>  
>  # eventually, we can have this option just 'select SPARSEMEM'
>  config MEMORY_HOTPLUG
> 
> 

-- 
Yasunori Goto 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-09 Thread Thomas Gleixner
On Sun, 9 Dec 2007, Parag Warudkar wrote:

> On Dec 8, 2007 6:12 PM, Parag Warudkar <[EMAIL PROTECTED]> wrote:
> > No problems after disabling CONFIG_HIGHRES_TIMERS , CONFIG_CPU_IDLE
> > and CONFIG_NO_HZ.
> >
> > I will try enabling them one by one - HRT, NOHZ and CPU_IDLE last -
> > that way we can at least tell what is required to be hit with this
> > problem.
> 
> Looks like CPU_IDLE=y is necessary for the problem to show up.
> With CPU_IDLE=n HRT+NO_HZ+TICK_ONESHOT does not give soft lockup problems.
> (Actually with HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=y  I do see short
> freezes on ssh - when I cannot type anything for may be a second even
> under 100% idle. But Soft Lock up doesnt show up in dmesg with this
> configuration.)

Can you please apply the patch below ? It prints out the internal
state of the clockevents/timer system when the softlockup is detected.

Thanks,

tglx

diff --git a/kernel/softlockup.c b/kernel/softlockup.c
index 11df812..82f1a05 100644
--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -118,6 +118,7 @@ void softlockup_tick(void)
show_regs(regs);
else
dump_stack();
+   sysrq_timer_list_show();
spin_unlock(_lock);
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[e2fsprogs PATCH] Userspace solution to time-based UUID without duplicates

2007-12-09 Thread Theodore Tso
On Tue, Nov 20, 2007 at 06:00:12PM -0500, Theodore Tso wrote:
> Basically, the only way to solve this problem 100% in userspace would
> be with a userspace daemon running as a privileged user, and some kind
> of Unix domain socket.
> 
> Patches to implement this in the e2fsprogs UUID library would be
> greatfully accepted.

This patch creates a userspace uuidd which correctly generates
time-based (version 1) UUID's, with the clock sequence number stored
in the filesystem so we correctly detect time going backwards across
processes and even across reboots.

I believe this patch is a better solution than Helge's kernel patch
solution or the kludgy patch to e2fsprogs in the the SLES RPM which
tries to solve this problem in another way, but which has been
problematic in the past.

Helge, could you try this out and see if it meets your needs?

   - Ted


commit 84e7405d89cb79b43d84e86051bf2f34d9ae5216
Author: Theodore Ts'o <[EMAIL PROTECTED]>
Date:   Mon Dec 10 00:22:16 2007 -0500

Add uuidd daemon to prevent duplicate time-based UUID's

Also store the clock sequence information in a state file in
/var/lib/misc/uuid-clock so that if the time goes backwards the clock
sequence counter can get bumped.  This allows us to completely
correctly generate time-based (version 1) UUID's according to the
algorithm specified RFC 4122.

Addresses-Sourceforge-Bug: #1529672

Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]>

diff --git a/debian/changelog b/debian/changelog
index 737242a..04a068b 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,9 @@
+e2fsprogs (1.40.3-2) unstable; urgency=low
+
+  * Add uuidd daemon
+
+ -- Theodore Y. Ts'o <[EMAIL PROTECTED]>  Sun, 09 Dec 2007 22:47:53 -0500
+
 e2fsprogs (1.40.3-1) unstable; urgency=medium
 
   * New upstream release
diff --git a/debian/control b/debian/control
index 448be70..03305d6 100644
--- a/debian/control
+++ b/debian/control
@@ -84,6 +84,19 @@ Description: universally unique id library
  libuuid generates and parses 128-bit universally unique id's (UUID's).
  See RFC 4122 for more information.
 
+Package: uuid-runtime
+Section: libs
+Priority: optional
+Depends: ${shlibs:Depends}
+Replaces: e2fsprogs (<= 1.40.3-1ubuntu1)
+Architecture: any
+Description: universally unique id library
+ libuuid generates and parses 128-bit universally unique id's (UUID's).
+ See RFC 4122 for more information.
+ .
+ This package contains the uuidd daemon which is used by libuuid as well as
+ the uuidgen program.
+
 Package: libuuid1-udeb
 Section: debian-installer
 Priority: optional
diff --git a/debian/rules b/debian/rules
index 842965e..ebbe062 100755
--- a/debian/rules
+++ b/debian/rules
@@ -354,7 +354,7 @@ binary-arch: install install-udeb
DH_OPTIONS= dh_installchangelogs -pe2fsprogs \
-plibblkid${BLKID_SOVERSION} -plibcomerr${COMERR_SOVERSION} \
-plibss${SS_SOVERSION} -plibuuid${UUID_SOVERSION} \
-   -pe2fslibs -puuid-dev -pe2fsck-static
+   -pe2fslibs -puuid-dev -puuid-runtime -pe2fsck-static
 
dh_fixperms
 ifneq ($(ismips),)
diff --git a/debian/uuid-runtime.copyright b/debian/uuid-runtime.copyright
new file mode 100644
index 000..f346739
--- /dev/null
+++ b/debian/uuid-runtime.copyright
@@ -0,0 +1,38 @@
+This package was added to the e2fsprogs debian source package by
+Theodore Ts'o <[EMAIL PROTECTED]> on Sat Mar 15 15:33:37 EST 2003
+
+It is part of the main e2fsprogs distribution, which can be found at:
+
+   http://sourceforge.net/projects/e2fsprogs
+
+Upstream Author: Theodore Ts'o <[EMAIL PROTECTED]>
+
+Copyright:
+
+Copyright (C) 1999, 2000, 2003, 2004 by Theodore Ts'o
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+1. Redistributions of source code must retain the above copyright
+   notice, and the entire permission notice in its entirety,
+   including the disclaimer of warranties.
+2. Redistributions in binary form must reproduce the above copyright
+   notice, this list of conditions and the following disclaimer in the
+   documentation and/or other materials provided with the distribution.
+3. The name of the author may not be used to endorse or promote
+   products derived from this software without specific prior
+   written permission.
+
+THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED
+WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, ALL OF
+WHICH ARE HEREBY DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR BE
+LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT
+OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+LIABILITY, WHETHER IN CONTRACT, 

[PATCH] knfsd: Change mailing list for nfsd in MAINTAINERS

2007-12-09 Thread NeilBrown

[EMAIL PROTECTED] is being decommissioned.

I wonder if the website should be changed to linux-nfs.org ...

Cc: "J. Bruce Fields" <[EMAIL PROTECTED]>
Cc: Trond Myklebust <[EMAIL PROTECTED]>
Signed-off-by: Neil Brown <[EMAIL PROTECTED]>

### Diffstat output
 ./MAINTAINERS |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff .prev/MAINTAINERS ./MAINTAINERS
--- .prev/MAINTAINERS   2007-12-10 16:03:52.0 +1100
+++ ./MAINTAINERS   2007-12-10 16:04:07.0 +1100
@@ -2259,7 +2259,7 @@ P:J. Bruce Fields
 M: [EMAIL PROTECTED]
 P: Neil Brown
 M: [EMAIL PROTECTED]
-L: [EMAIL PROTECTED]
+L: [EMAIL PROTECTED]
 W: http://nfs.sourceforge.net/
 S: Supported
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] x64/page.h: convert some macros to inlines

2007-12-09 Thread Randy Dunlap
On Fri, 7 Dec 2007 16:31:42 -0800 Andrew Morton wrote:

> Could someone *please* start a little project of extirpating this utter
> brain damage?  Convert those macros to typechecked static inlines on x86
> (at least) so this sort of thing (which happens again and again and again)
> is lessened?

Here's a start on it.  x86 only and only 4 functions so far.
Builds cleanly for i386 and x86_64.

---

From: Randy Dunlap <[EMAIL PROTECTED]>

Convert clear_page/copy_page macros to inline functions for type-checking.
Andrew wants to extirpate these ugly macros.

Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>
---
 include/asm-x86/page_32.h |   39 +--
 include/asm-x86/page_64.h |   19 +++
 2 files changed, 48 insertions(+), 10 deletions(-)

--- linux-2.6.24-rc4-git6.orig/include/asm-x86/page_32.h
+++ linux-2.6.24-rc4-git6/include/asm-x86/page_32.h
@@ -12,12 +12,21 @@
 #ifdef __KERNEL__
 #ifndef __ASSEMBLY__
 
+#include 
+
 #ifdef CONFIG_X86_USE_3DNOW
 
 #include 
 
-#define clear_page(page)   mmx_clear_page((void *)(page))
-#define copy_page(to,from) mmx_copy_page(to,from)
+static inline void clear_page(void *page)
+{
+   mmx_clear_page(page);
+}
+
+static inline void copy_page(void *to, void *from)
+{
+   mmx_copy_page(to, from);
+}
 
 #else
 
@@ -26,13 +35,31 @@
  * Maybe the K6-III ?
  */
  
-#define clear_page(page)   memset((void *)(page), 0, PAGE_SIZE)
-#define copy_page(to,from) memcpy((void *)(to), (void *)(from), PAGE_SIZE)
+static inline void clear_page(void *page)
+{
+   memset(page, 0, PAGE_SIZE);
+}
+
+static inline void copy_page(void *to, void *from)
+{
+   memcpy(to, from, PAGE_SIZE);
+}
 
 #endif
 
-#define clear_user_page(page, vaddr, pg)   clear_page(page)
-#define copy_user_page(to, from, vaddr, pg)copy_page(to, from)
+struct page;
+
+static void inline clear_user_page(void *page, unsigned long vaddr,
+   struct page *pg)
+{
+   clear_page(page);
+}
+
+static void inline copy_user_page(void *to, void *from, unsigned long vaddr,
+   struct page *topage)
+{
+   copy_page(to, from);
+}
 
 #define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
--- linux-2.6.24-rc4-git6.orig/include/asm-x86/page_64.h
+++ linux-2.6.24-rc4-git6/include/asm-x86/page_64.h
@@ -42,11 +42,22 @@
 
 extern unsigned long end_pfn;
 
-void clear_page(void *);
-void copy_page(void *, void *);
+void clear_page(void *page);
+void copy_page(void *to, void *from);
 
-#define clear_user_page(page, vaddr, pg)   clear_page(page)
-#define copy_user_page(to, from, vaddr, pg)copy_page(to, from)
+struct page;
+
+static void inline clear_user_page(void *page, unsigned long vaddr,
+   struct page *pg)
+{
+   clear_page(page);
+}
+
+static void inline copy_user_page(void *to, void *from, unsigned long vaddr,
+   struct page *topage)
+{
+   copy_page(to, from);
+}
 
 #define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc3-git4 NFS crossmnt regression

2007-12-09 Thread Neil Brown
On Sunday December 9, [EMAIL PROTECTED] wrote:
> On Saturday 08 December 2007 01:43:28 Rafael J. Wysocki wrote:
> > On Saturday, 8 of December 2007, Andrew Morton wrote:
> > > On Fri, 07 Dec 2007 17:51:58 -0500
> > > Trond Myklebust <[EMAIL PROTECTED]> wrote:
> > > 
> > > > 
> > > > On Fri, 2007-12-07 at 14:39 -0500, Shane wrote:
> > > > > On Dec 7, 2007 2:16 PM, Shane <[EMAIL PROTECTED]> wrote:
> > > > > ...
> > > > > > Confirmed working in rc4-git5.  I'll deploy this kernel in a few 
> > > > > > more
> > > > > > spots and check for other regressions.
> > > > > 
> > > > > Hmm, I installed a new kernel built from the same sources on the NFS
> > > > > server. And now I don't see anything at all in the crossmnt dirs.
> > > > > 
> > > > > ls /dirA/dirB/dirC  --> zero output (empty dir)
> > > > > 
> > > > > Are there any other pending fixes?
> 
> Hi,
> Due to the fact that I was bitten by this bug (I thought it is a feature), 
> and a bit of lack
> of understanding of NFS4 I want to ask few questions about NFS:
> 
> 1) I want to export whole file-system  with submounts to a range of clients.
> As 'exports' manual says I can't do so, is that true?

You should be able to do this successfully with nfs-utils 1.1 or
later.  Where does the manual say you cannot - we should fix that.


> 
> Can you tell me how properly to use crossmnt and nohide?

It is best not to use nohide - we should probably mark it as
'legacy'.

Simply export the top level mountpoint as 'crossmnt'  and everything
below there will be exported.

> Where should I put those options in root file-system export or in submount 
> export?

crossmnt goes at the top.  nohide goes in the submount.  Both have
the same general effect though with subtle differences.
You don't need both (though that doesn't hurt).
Just use crossmnt at the top,  Then you don't need to mention the
lower level filesystems at all.

> 
> 2) NFS4 - I can't get it working:
> 
> *I have a LFS system, and this is what I did (NFS3 works fine, but crossmnt, 
> and nohide seems not to work, probably due to above bug)
>   I also have seen errors about stale handles 
> *Kernel - 2.6.24-rc3 with NFS3/4 client/server enabled on both host and 
> guest. (both client and server running this kernel)
> *rpc.idmapd running on both client and server + all standard NFS3 tools
> *NFS tools 1.1.1 with nfs4 support compiled + without GSS (on server)
> * /etc/exports with fsid=0: (on server)
>   /tmp *(fsid=0,insecure,rw,async,anonuid=100,anongid=1000)
> * mounting with -tnfs4 server:/ /mnt/tmp
> 
> Still doesn't work, using wireshark shows that
>   NFSV4 COMPOUND call with
>   Opcode: PUTROOTFH (24)
>   Opcode: GETFH (10)
>   Opcode: GETATTR (9)
> 
> Fails with 
>   Reject State: AUTH_ERROR (1)
>   Auth State: bad credential (seal broken) (1)
> 
> 
> Any ideas?
> 
> (I decided to switch to NFS4 only due to the lack of ability to see 
> underlying mounts)
> 

All of this should work fine with v3.  Once you have the right patch
for the crossmnt bug applied, if you have further problems post them
to [EMAIL PROTECTED]

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] swap image signature check upon resume

2007-12-09 Thread Borislav Petkov
On Sun, Dec 09, 2007 at 10:46:35PM +0100, Rafael J. Wysocki wrote:
> On Sunday, 9 of December 2007, Borislav Petkov wrote:
> > On Sun, Dec 09, 2007 at 03:27:57PM +0100, Rafael J. Wysocki wrote:
> > ...
> > 
> > > > Instead, I'd rather issue a warning that the swsusp header mismatches, 
> > > > say with
> > > > which kernel the machine got suspended with and then start the 
> > > > countdown for reboot.
> > > 
> > > What exactly would that change?  You need to reboot anyway and fsck will 
> > > run on
> > > the filesystems regardless of which kernel you boot with.
> > 
> > well, you'll have the chance to reboot with the kernel the machine got 
> > suspended
> > with and then the swsusp image header _will_ match so no fsck-ing. or am i
> > missing something...
> 
> Yes, you are. :-)
> 
> With the new code (which BTW I'm assuming we are talking about) the images are
> not matched against the kernel they were created by, but against a hard-coded
> magic number (defined in suspend_64.c) playing the role of the "header 
> protocol
> version" and against some system parameters, like the amount of RAM etc.
> Since all kernels containing the new code use the same magic number, all of
> them will match or none of them will match.

right, i was kinda wondering when actually a swsusp image won't match after 
looking
at check_image_kernel() but missed that arch-specific RESTORE_MAGIC bit.
Thanks for clearing that up.

-- 
Regards/Gruß,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2.6.24-rc4-mm 6/6] gpiolib: replacement mcp23s08 driver

2007-12-09 Thread David Brownell
From: David Brownell <[EMAIL PROTECTED]>

Basic driver for 8-bit SPI based MCP23S08 GPIO expander, without support for
IRQs or the shared chipselect mechanism.

Signed-off-by: David Brownell <[EMAIL PROTECTED]>
---
Other than the new directory and Kconfig symbol, this is identical to the
code currently in MM except for:  (a) gpio_desc[] related changes, and
(b) now using module_init(), since this directory is initialzed earlier.

 drivers/gpio/Kconfig |7 
 drivers/gpio/Makefile|1 
 drivers/gpio/mcp23s08.c  |  357 +++
 include/linux/spi/mcp23s08.h |   24 ++
 4 files changed, 389 insertions(+)

--- a/drivers/gpio/Kconfig  2007-12-09 19:51:05.0 -0800
+++ b/drivers/gpio/Kconfig  2007-12-09 19:51:06.0 -0800
@@ -34,4 +34,11 @@ config GPIO_PCF857X
 
 comment "SPI GPIO expanders:"
 
+config GPIO_MCP23S08
+   tristate "Microchip MCP23S08 I/O expander"
+   depends on SPI_MASTER
+   help
+ SPI driver for Microchip MCP23S08 I/O expander.  This provides
+ a GPIO interface supporting inputs and outputs.
+
 endmenu
--- a/drivers/gpio/Makefile 2007-12-09 19:51:05.0 -0800
+++ b/drivers/gpio/Makefile 2007-12-09 19:51:06.0 -0800
@@ -1,3 +1,4 @@
 # gpio support: dedicated expander chips, etc
 
+obj-$(CONFIG_GPIO_MCP23S08)+= mcp23s08.o
 obj-$(CONFIG_GPIO_PCF857X) += pcf857x.o
--- /dev/null   1970-01-01 00:00:00.0 +
+++ b/drivers/gpio/mcp23s08.c   2007-12-09 19:51:06.0 -0800
@@ -0,0 +1,357 @@
+/*
+ * mcp23s08.c - SPI gpio expander driver
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include 
+
+
+/* Registers are all 8 bits wide.
+ *
+ * The mcp23s17 has twice as many bits, and can be configured to work
+ * with either 16 bit registers or with two adjacent 8 bit banks.
+ *
+ * Also, there are I2C versions of both chips.
+ */
+#define MCP_IODIR  0x00/* init/reset:  all ones */
+#define MCP_IPOL   0x01
+#define MCP_GPINTEN0x02
+#define MCP_DEFVAL 0x03
+#define MCP_INTCON 0x04
+#define MCP_IOCON  0x05
+#  define IOCON_SEQOP  (1 << 5)
+#  define IOCON_HAEN   (1 << 3)
+#  define IOCON_ODR(1 << 2)
+#  define IOCON_INTPOL (1 << 1)
+#define MCP_GPPU   0x06
+#define MCP_INTF   0x07
+#define MCP_INTCAP 0x08
+#define MCP_GPIO   0x09
+#define MCP_OLAT   0x0a
+
+struct mcp23s08 {
+   struct spi_device   *spi;
+   u8  addr;
+
+   /* lock protects the cached values */
+   struct mutexlock;
+   u8  cache[11];
+
+   struct gpio_chipchip;
+
+   struct work_struct  work;
+};
+
+static int mcp23s08_read(struct mcp23s08 *mcp, unsigned reg)
+{
+   u8  tx[2], rx[1];
+   int status;
+
+   tx[0] = mcp->addr | 0x01;
+   tx[1] = reg;
+   status = spi_write_then_read(mcp->spi, tx, sizeof tx, rx, sizeof rx);
+   return (status < 0) ? status : rx[0];
+}
+
+static int mcp23s08_write(struct mcp23s08 *mcp, unsigned reg, u8 val)
+{
+   u8  tx[3];
+
+   tx[0] = mcp->addr;
+   tx[1] = reg;
+   tx[2] = val;
+   return spi_write_then_read(mcp->spi, tx, sizeof tx, NULL, 0);
+}
+
+static int
+mcp23s08_read_regs(struct mcp23s08 *mcp, unsigned reg, u8 *vals, unsigned n)
+{
+   u8  tx[2];
+
+   if ((n + reg) > sizeof mcp->cache)
+   return -EINVAL;
+   tx[0] = mcp->addr | 0x01;
+   tx[1] = reg;
+   return spi_write_then_read(mcp->spi, tx, sizeof tx, vals, n);
+}
+
+/*--*/
+
+static int mcp23s08_direction_input(struct gpio_chip *chip, unsigned offset)
+{
+   struct mcp23s08 *mcp = container_of(chip, struct mcp23s08, chip);
+   int status;
+
+   mutex_lock(>lock);
+   mcp->cache[MCP_IODIR] |= (1 << offset);
+   status = mcp23s08_write(mcp, MCP_IODIR, mcp->cache[MCP_IODIR]);
+   mutex_unlock(>lock);
+   return status;
+}
+
+static int mcp23s08_get(struct gpio_chip *chip, unsigned offset)
+{
+   struct mcp23s08 *mcp = container_of(chip, struct mcp23s08, chip);
+   int status;
+
+   mutex_lock(>lock);
+
+   /* REVISIT reading this clears any IRQ ... */
+   status = mcp23s08_read(mcp, MCP_GPIO);
+   if (status < 0)
+   status = 0;
+   else {
+   mcp->cache[MCP_GPIO] = status;
+   status = !!(status & (1 << offset));
+   }
+   mutex_unlock(>lock);
+   return status;
+}
+
+static int __mcp23s08_set(struct mcp23s08 *mcp, unsigned mask, int value)
+{
+   u8 olat = mcp->cache[MCP_OLAT];
+
+   if (value)
+   olat |= mask;
+   else
+   olat &= ~mask;
+   mcp->cache[MCP_OLAT] = olat;
+   return mcp23s08_write(mcp, MCP_OLAT, olat);
+}
+
+static void mcp23s08_set(struct gpio_chip *chip, unsigned 

[patch 2.6.24-rc4-mm 4/6] gpiolib: create empty drivers/gpio

2007-12-09 Thread David Brownell
From: David Brownell <[EMAIL PROTECTED]>

Add an empty drivers/gpio directory for gpiolib based GPIO expanders.
We already have three of them (two I2C, one SPI), and there are dozens
of similar chips that only exist for GPIO expansion.

This won't be the only place to hold such gpio_chip code.  Many external
chips add a few GPIOs as secondary functionality, and platform code
frequently needs to closely integrate GPIO and IRQ support.

This is placed *early* in the build/link sequence since it's common for
other drivers to depend on GPIOs to do their work, so they need to be
initialized early in the device_initcall() sequence.

Signed-off-by: David Brownell <[EMAIL PROTECTED]>
Acked-by: Jean Delvare <[EMAIL PROTECTED]>
Cc: Eric Miao <[EMAIL PROTECTED]>
---
 arch/arm/Kconfig  |2 ++
 drivers/Kconfig   |2 ++
 drivers/Makefile  |1 +
 drivers/gpio/Kconfig  |   14 ++
 drivers/gpio/Makefile |1 +
 5 files changed, 20 insertions(+)

--- a/arch/arm/Kconfig  2007-12-09 19:50:39.0 -0800
+++ b/arch/arm/Kconfig  2007-12-09 19:51:04.0 -0800
@@ -1034,6 +1034,8 @@ source "drivers/i2c/Kconfig"
 
 source "drivers/spi/Kconfig"
 
+source "drivers/gpio/Kconfig"
+
 source "drivers/w1/Kconfig"
 
 source "drivers/power/Kconfig"
--- a/drivers/Kconfig   2007-12-09 19:50:39.0 -0800
+++ b/drivers/Kconfig   2007-12-09 19:51:04.0 -0800
@@ -52,6 +52,8 @@ source "drivers/i2c/Kconfig"
 
 source "drivers/spi/Kconfig"
 
+source "drivers/gpio/Kconfig"
+
 source "drivers/w1/Kconfig"
 
 source "drivers/power/Kconfig"
--- a/drivers/Makefile  2007-12-09 19:50:39.0 -0800
+++ b/drivers/Makefile  2007-12-09 19:51:04.0 -0800
@@ -5,6 +5,7 @@
 # Rewritten to use lists instead of if-statements.
 #
 
+obj-$(CONFIG_GPIO_LIB) += gpio/
 obj-$(CONFIG_PCI)  += pci/
 obj-$(CONFIG_PARISC)   += parisc/
 obj-$(CONFIG_RAPIDIO)  += rapidio/
--- /dev/null   1970-01-01 00:00:00.0 +
+++ b/drivers/gpio/Kconfig  2007-12-09 19:51:04.0 -0800
@@ -0,0 +1,14 @@
+#
+# platform-neutral GPIO support
+#
+
+menu "GPIO Expanders"
+   depends on GPIO_LIB
+
+# put expanders in the right section, in alphabetical order
+
+comment "I2C GPIO expanders:"
+
+comment "SPI GPIO expanders:"
+
+endmenu
--- /dev/null   1970-01-01 00:00:00.0 +
+++ b/drivers/gpio/Makefile 2007-12-09 19:51:04.0 -0800
@@ -0,0 +1 @@
+# gpio support: dedicated expander chips, etc
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2.6.24-rc4-mm 5/6] gpiolib: pcf857x i2c expander driver

2007-12-09 Thread David Brownell
From: David Brownell <[EMAIL PROTECTED]>

This is a new-style I2C driver for most common 8 and 16 bit I2C based
"quasi-bidirectional" GPIO expanders:  pcf8574 or pcf8575, and several
compatible models (mostly faster, supporting I2C at up to 1 MHz).

The driver exposes the GPIO signals using the platform-neutral GPIO
programming interface, so they are easily accessed by other kernel code.
The lack of such a flexible kernel API has been a big factor in the
proliferation of board-specific drivers for these chips... stuff that
rarely makes it upstream since it's so ugly.  This driver will let such
boards use standard calls.

Since it's a new-style driver, these devices must be configured as
part of board-specific init.  That eliminates the need for error-prone
manual configuration of module parameters, and makes compatibility with
legacy drivers (pcf8574.c, pc8575.c) for these chips easier (there's
a clear either/or disjunction).

Signed-off-by: David Brownell <[EMAIL PROTECTED]>
Acked-by: Jean Delvare <[EMAIL PROTECTED]>
---
 drivers/gpio/Kconfig|   23 +++
 drivers/gpio/Makefile   |2 
 drivers/gpio/pcf857x.c  |  330 
 include/linux/i2c/pcf857x.h |   45 ++
 4 files changed, 400 insertions(+)

--- a/drivers/gpio/Kconfig  2007-12-09 19:51:04.0 -0800
+++ b/drivers/gpio/Kconfig  2007-12-09 19:51:05.0 -0800
@@ -9,6 +9,29 @@ menu "GPIO Expanders"
 
 comment "I2C GPIO expanders:"
 
+config GPIO_PCF857X
+   tristate "PCF857x, PCA857x, and PCA967x I2C GPIO expanders"
+   depends on I2C
+   help
+ Say yes here to provide access to most "quasi-bidirectional" I2C
+ GPIO expanders used for additional digital outputs or inputs.
+ Most of these parts are from NXP, though TI is a second source for
+ some of them.  Compatible models include:
+
+ 8 bits:   pcf8574, pcf8574a, pca8574, pca8574a,
+   pca9670, pca9672, pca9674, pca9674a
+
+ 16 bits:  pcf8575, pcf8575c, pca8575,
+   pca9671, pca9673, pca9675
+
+ Your board setup code will need to declare the expanders in
+ use, and assign numbers to the GPIOs they expose.  Those GPIOs
+ can then be used from drivers and other kernel code, just like
+ other GPIOs, but only accessible from task contexts.
+
+ This driver provides an in-kernel interface to those GPIOs using
+ platform-neutral GPIO calls.
+
 comment "SPI GPIO expanders:"
 
 endmenu
--- a/drivers/gpio/Makefile 2007-12-09 19:51:04.0 -0800
+++ b/drivers/gpio/Makefile 2007-12-09 19:51:05.0 -0800
@@ -1 +1,3 @@
 # gpio support: dedicated expander chips, etc
+
+obj-$(CONFIG_GPIO_PCF857X) += pcf857x.o
--- /dev/null   1970-01-01 00:00:00.0 +
+++ b/drivers/gpio/pcf857x.c2007-12-09 19:51:05.0 -0800
@@ -0,0 +1,330 @@
+/*
+ * pcf857x - driver for pcf857x, pca857x, and pca967x I2C GPIO expanders
+ *
+ * Copyright (C) 2007 David Brownell
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+
+/*
+ * The pcf857x, pca857x, and pca967x chips only expose one read and one
+ * write register.  Writing a "one" bit (to match the reset state) lets
+ * that pin be used as an input; it's not an open-drain model, but acts
+ * a bit like one.  This is described as "quasi-bidirectional"; read the
+ * chip documentation for details.
+ *
+ * Many other I2C GPIO expander chips (like the pca953x models) have
+ * more complex register models and more conventional circuitry using
+ * push/pull drivers.  They often use the same 0x20..0x27 addresses as
+ * pcf857x parts, making the "legacy" I2C driver model problematic.
+ */
+struct pcf857x {
+   struct gpio_chipchip;
+   struct i2c_client   *client;
+   unsignedout;/* software latch */
+};
+
+/*-*/
+
+/* Talk to 8-bit I/O expander */
+
+static int pcf857x_input8(struct gpio_chip *chip, unsigned offset)
+{
+   struct pcf857x  *gpio = container_of(chip, struct pcf857x, chip);
+
+   gpio->out |= (1 << offset);
+   return i2c_smbus_write_byte(gpio->client, gpio->out);
+}
+
+static int 

[patch 2.6.24-rc4-mm 2/6] gpiolib: more CONFIG_DEBUG_GPIO diagnostics

2007-12-09 Thread David Brownell
From: David Brownell <[EMAIL PROTECTED]>

Update gpiolib behavior with CONFIG_DEBUG_GPIO to include messages
on some fault paths that are common during bringup:  gpiochip_add,
gpio_request, and the two gpio_direction_* calls.

Also morph that CONFIG symbol into compile-time -DDEBUG.

Signed-off-by: David Brownell <[EMAIL PROTECTED]>
Cc: Tzachi Perelstein <[EMAIL PROTECTED]>
---
 lib/Kconfig.debug |   10 ++
 lib/Makefile  |4 
 lib/gpiolib.c |   33 -
 3 files changed, 38 insertions(+), 9 deletions(-)

--- a/lib/Kconfig.debug 2007-12-09 19:50:50.0 -0800
+++ b/lib/Kconfig.debug 2007-12-09 19:51:02.0 -0800
@@ -160,10 +160,12 @@ config DEBUG_GPIO
bool "Debug GPIO calls"
depends on DEBUG_KERNEL && GPIO_LIB
help
- Say Y here to add some extra checks to GPIO calls, ensuring that
- GPIOs have been properly initialized before they are used and
- that sleeping calls aren't made from nonsleeping contexts.  This
- can make bitbanged serial protocols slower.
+ Say Y here to add some extra checks and diagnostics to GPIO calls.
+ The checks help ensure that GPIOs have been properly initialized
+ before they are used and that sleeping calls aren't made from
+ nonsleeping contexts.  They can make bitbanged serial protocols
+ slower.  The diagnostics help catch the type of setup errors
+ that are most common when setting up new platforms or boards.
 
 config DEBUG_SLAB_LEAK
bool "Memory leak debugging"
--- a/lib/Makefile  2007-12-09 19:50:50.0 -0800
+++ b/lib/Makefile  2007-12-09 19:51:02.0 -0800
@@ -68,6 +68,10 @@ obj-$(CONFIG_FAULT_INJECTION) += fault-i
 
 lib-$(CONFIG_GENERIC_BUG) += bug.o
 
+ifeq ($(CONFIG_DEBUG_GPIO),y)
+CFLAGS_gpiolib.o += -DDEBUG
+endif
+
 lib-$(CONFIG_GPIO_LIB) += gpiolib.o
 
 hostprogs-y:= gen_crc32table
--- a/lib/gpiolib.c 2007-12-09 19:50:59.0 -0800
+++ b/lib/gpiolib.c 2007-12-09 19:51:02.0 -0800
@@ -20,9 +20,12 @@
 
 
 /* When debugging, extend minimal trust to callers and platform code.
+ * Also emit diagnostic messages that may help initial bringup, when
+ * board setup or driver bugs are most common.
+ *
  * Otherwise, minimize overhead in what may be bitbanging codepaths.
  */
-#ifdef CONFIG_DEBUG_GPIO
+#ifdef DEBUG
 #defineextra_checks1
 #else
 #defineextra_checks0
@@ -48,8 +51,11 @@ struct gpio_desc {
 static struct gpio_desc gpio_desc[ARCH_NR_GPIOS];
 
 
-/* Warn when drivers omit gpio_request() calls -- legal but
- * ill-advised when setting direction, and otherwise illegal.
+/* Warn when drivers omit gpio_request() calls -- legal but ill-advised
+ * when setting direction, and otherwise illegal.  Until board setup code
+ * and drivers use explicit requests everywhere (which won't happen when
+ * those calls have no teeth) we can't avoid autorequesting.  This nag
+ * message should motivate switching to explicit requests...
  */
 static void gpio_ensure_requested(struct gpio_desc *desc)
 {
@@ -86,8 +92,10 @@ int gpiochip_add(struct gpio_chip *chip)
 * dynamic allocation.  We don't currently support that.
 */
 
-   if (chip->base < 0 || (chip->base  + chip->ngpio) >= ARCH_NR_GPIOS)
-   return -EINVAL;
+   if (chip->base < 0 || (chip->base  + chip->ngpio) >= ARCH_NR_GPIOS) {
+   status = -EINVAL;
+   goto fail;
+   }
 
spin_lock_irqsave(_lock, flags);
 
@@ -106,6 +114,12 @@ int gpiochip_add(struct gpio_chip *chip)
}
 
spin_unlock_irqrestore(_lock, flags);
+fail:
+   /* failures here can mean systems won't boot... */
+   if (status)
+   pr_err("gpiochip_add: gpios %d..%d (%s) not registered\n",
+   chip->base, chip->base + chip->ngpio,
+   chip->label ? : "generic");
return status;
 }
 EXPORT_SYMBOL_GPL(gpiochip_add);
@@ -172,6 +186,9 @@ int gpio_request(unsigned gpio, const ch
status = -EBUSY;
 
 done:
+   if (status)
+   pr_debug("gpio_request: gpio-%d (%s) status %d\n",
+   gpio, label ? : "?", status);
spin_unlock_irqrestore(_lock, flags);
return status;
 }
@@ -272,6 +289,9 @@ int gpio_direction_input(unsigned gpio)
return status;
 fail:
spin_unlock_irqrestore(_lock, flags);
+   if (status)
+   pr_debug("%s: gpio-%d status %d\n",
+   __FUNCTION__, gpio, status);
return status;
 }
 EXPORT_SYMBOL_GPL(gpio_direction_input);
@@ -307,6 +327,9 @@ int gpio_direction_output(unsigned gpio,
return status;
 fail:
spin_unlock_irqrestore(_lock, flags);
+   if (status)
+   pr_debug("%s: gpio-%d status %d\n",
+   __FUNCTION__, gpio, status);
return status;
 }
 EXPORT_SYMBOL_GPL(gpio_direction_output);
--
To 

[patch 2.6.24-rc4-mm 3/6] gpiolib: implementor-oriented documentation

2007-12-09 Thread David Brownell
From: David Brownell <[EMAIL PROTECTED]>

Add some documentation highlighting implementors' views of the new gpiolib
stuff.  Such developers may be supporting new gpio controllers, platforms,
or boards.  Obviously there's often some overlap there, but the concerns
for each task aren't identical.

This also fixes a minor bug, which turned up as a discrepancy against
the documentation:  if a gpio_chip doesn't have a get() method, just
return zero when asked to read its value.

Signed-off-by: David Brownell <[EMAIL PROTECTED]>
---
 Documentation/gpio.txt |  103 ++---
 lib/gpiolib.c  |   29 +
 2 files changed, 126 insertions(+), 6 deletions(-)

--- a/Documentation/gpio.txt2007-12-09 19:50:50.0 -0800
+++ b/Documentation/gpio.txt2007-12-09 19:51:03.0 -0800
@@ -32,7 +32,7 @@ The exact capabilities of GPIOs vary bet
   - Input values are likewise readable (1, 0).  Some chips support readback
 of pins configured as "output", which is very useful in such "wire-OR"
 cases (to support bidirectional signaling).  GPIO controllers may have
-input de-glitch logic, sometimes with software controls.
+input de-glitch/debounce logic, sometimes with software controls.
 
   - Inputs can often be used as IRQ signals, often edge triggered but
 sometimes level triggered.  Such IRQs may be configurable as system
@@ -60,12 +60,13 @@ used on a board that's wired differently
 functionality can be very portable.  Other features are platform-specific,
 and that can be critical for glue logic.
 
-Plus, this doesn't define an implementation framework, just an interface.
+Plus, this doesn't require any implementation framework, just an interface.
 One platform might implement it as simple inline functions accessing chip
 registers; another might implement it by delegating through abstractions
 used for several very different kinds of GPIO controller.  (There is some
-library code supporting such an implementation strategy, but drivers acting
-as clients to the GPIO interface should not care how it's implemented.)
+optional code supporting such an implementation strategy, described later
+in this document, but drivers acting as clients to the GPIO interface must
+not care how it's implemented.)
 
 That said, if the convention is supported on their platform, drivers should
 use it when possible.  Platforms should declare GENERIC_GPIO support in
@@ -152,7 +153,7 @@ Use these calls to access such GPIOs:
 The values are boolean, zero for low, nonzero for high.  When reading the
 value of an output pin, the value returned should be what's seen on the
 pin ... that won't always match the specified output value, because of
-issues including wire-OR and output latencies.
+issues including open-drain signaling and output latencies.
 
 The get/set calls have no error returns because "invalid GPIO" should have
 been reported earlier from gpio_direction_*().  However, note that not all
@@ -328,3 +329,95 @@ a side effect of configuring an add-on b
 
 These calls are purely for kernel space, but a userspace API could be built
 on top of them.
+
+
+GPIO implementor's framework (OPTIONAL)
+===
+As noted earlier, there is an optional implementation framework making it
+easier for platforms to support different kinds of GPIO controller using
+the same programming interface.
+
+As a debugging aid, if debugfs is available a /sys/kernel/debug/gpio file
+will be found there.  That will list all the controllers registered through
+this framework, and the state of the GPIOs currently in use.
+
+
+Controller Drivers: gpio_chip
+-
+In this framework each GPIO controller is packaged as a "struct gpio_chip"
+with information common to each controller of that type:
+
+ - methods to establish GPIO direction
+ - methods used to access GPIO values
+ - flag saying whether calls to its methods may sleep
+ - optional debugfs dump method (showing extra state like pullup config)
+ - label for diagnostics
+
+There is also per-instance data, which may come from device.platform_data:
+the number of its first GPIO, and how many GPIOs it exposes.
+
+The code implementing a gpio_chip should support multiple instances of the
+controller, possibly using the driver model.  That code will configure each
+gpio_chip and issue gpiochip_add().  Removing a GPIO controller should be
+rare; use gpiochip_remove() when it is unavoidable.
+
+Normally a gpio_chip is part of an instance-specific structure with state
+not exposed by the GPIO interfaces, such as addressing, power management,
+and more.  Chips such as codecs will have complex non-GPIO state,
+
+Any debugfs dump method should normally ignore signals which haven't been
+requested as GPIOs.  They can use gpiochip_is_requested(), which returns
+either NULL or the label associated with that GPIO when it was requested.
+
+
+Platform Support
+
+To support 

[patch 2.6.24-rc4-mm 0/6] gpiolib updates

2007-12-09 Thread David Brownell
Following this are several patches updating the current gpio
implemenentation framework.  Because one of those changes
involves creating a new drivers/gpio directory, the patches
are a bit simpler if two existing patches are first removed
from the MM tree:

mcp23s08-spi-gpio-expander.patch
mcp23s08-spi-gpio-expander-checkpatch-fixes.patch

The patches in this series are:

 - Adding a gpio_desc[] layer, making it easier to mix in
   gpio_chips with different numbers of GPIOs without any
   large holes in the number sequence.  When those holes
   are also reflected in IRQ number sequences, they cost
   a lot of memory.  (Based on work from Eric Miao.)

 - More diagnostics with CONFIG_DEBUG_GPIO.  Handy when
   bringing up a new board, needless later.  (Various
   platforms had similar diagnostics.)

 - A bit of implementor-oriented documentation for this
   gpiolib infrastructure.

 - Create a new drivers/gpio directory for expanders and
   (eventually) other code.  (Strongly encouraged by
   Jean Delvare, to help shrink drivers/i2c/chips.)

 - New-style I2C driver for pcf8574/pcf8575/compatible
   GPIO expanders ... these are pretty widely used.
   (This was previously posted as patch/rfc.)

 - Relocated mcp23s08 driver.  This includes two minor
   functional changes, to handle the gpio_desc changes
   and modified driver init sequence, and it merges the
   minor checkpatch.pl update above.

There's an updated pca9539 driver in the works too; it
should land in the drivers/i2c directory.

I'm thinking there should be a non-functional change to
the gpiolib code:  move it from lib to drivers/gpio.

My question is whether that's better done by replacing
the current patches with one new patch, or by a patch
deleting the current lib/gpiolib.c and adding a new
drivers/gpio/gpiolib.c ... I think the former would
make more sense to anyone looking at GIT history.

- Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2.6.24-rc4-mm 1/6] gpiolib: add gpio_desc[]

2007-12-09 Thread David Brownell
From: David Brownell <[EMAIL PROTECTED]>

Update gpiolib to use a table of per-GPIO "struct gpio_desc" instead of
a table of "struct gpio_chip".

 - Change "is_out" and "requested" from arrays in "struct gpio_chip" to
   bit fields in "struct gpio_desc", eliminating ARCH_GPIOS_PER_CHIP.

 - Stop overloading "requested" flag with "label" tracked for debugfs.

 - Change gpiochip_is_requested() into a regular function, since it
   now accesses data that's not exported from the gpiolib code.  Also
   change its signature, for the same reason.

 - Reduce default ARCH_NR_GPIOS to 256 to shrink gpio_desc table cost.
   On 32-bit platforms without debugfs, that table size is 2KB.

This makes it easier to work with chips with different numbers of GPIOs,
and to avoid holes in GPIOs number sequences.  Those holes can cost a
lot of unusable irq_desc space for GPIOs that act as IRQs; and needing
the can cause surprises when trying to set up multiple GPIO controllers.

Based on a patch from Eric Miao.

Signed-off-by: David Brownell <[EMAIL PROTECTED]>
Cc: Eric Miao <[EMAIL PROTECTED]>
---
 arch/avr32/mach-at32ap/pio.c |7 -
 include/asm-avr32/arch-at32ap/gpio.h |2 
 include/asm-generic/gpio.h   |   49 +---
 lib/gpiolib.c|  213 ---
 4 files changed, 139 insertions(+), 132 deletions(-)

--- a/arch/avr32/mach-at32ap/pio.c  2007-12-09 19:50:50.0 -0800
+++ b/arch/avr32/mach-at32ap/pio.c  2007-12-09 19:50:59.0 -0800
@@ -315,12 +315,15 @@ static void pio_bank_show(struct seq_fil
bank = 'A' + pio->pdev->id;
 
for (i = 0, mask = 1; i < 32; i++, mask <<= 1) {
-   if (!gpiochip_is_requested(chip, i))
+   const char *label;
+
+   label = gpiochip_is_requested(chip, i);
+   if (!label)
continue;
 
seq_printf(s, " gpio-%-3d P%c%-2d (%-12s) %s %s %s",
chip->base + i, bank, i,
-   chip->requested[i],
+   label,
(osr & mask) ? "out" : "in ",
(mask & pdsr) ? "hi" : "lo",
(mask & pusr) ? "  " : "up");
--- a/include/asm-avr32/arch-at32ap/gpio.h  2007-12-09 19:50:50.0 
-0800
+++ b/include/asm-avr32/arch-at32ap/gpio.h  2007-12-09 19:50:59.0 
-0800
@@ -8,7 +8,7 @@
 /* Some GPIO chips can manage IRQs; some can't.  The exact numbers can
  * be changed if needed, but for the moment they're not configurable.
  */
-#define ARCH_NR_GPIOS  (NR_GPIO_IRQS + 2 * ARCH_GPIOS_PER_CHIP)
+#define ARCH_NR_GPIOS  (NR_GPIO_IRQS + 2 * 32)
 
 
 /* Arch-neutral GPIO API, supporting both "native" and external GPIOs. */
--- a/include/asm-generic/gpio.h2007-12-09 19:50:50.0 -0800
+++ b/include/asm-generic/gpio.h2007-12-09 19:50:59.0 -0800
@@ -4,21 +4,16 @@
 #ifdef CONFIG_GPIO_LIB
 
 /* Platforms may implement their GPIO interface with library code,
- * at a small performance cost for non-inlined operations.
+ * at a small performance cost for non-inlined operations and some
+ * extra memory (for code and for per-GPIO table entries).
  *
  * While the GPIO programming interface defines valid GPIO numbers
  * to be in the range 0..MAX_INT, this library restricts them to the
- * smaller range 0..ARCH_NR_GPIOS and allocates them in groups of
- * ARCH_GPIOS_PER_CHIP (which will usually be the word size used for
- * each bank of a SOC processor's integrated GPIO modules).
+ * smaller range 0..ARCH_NR_GPIOS.
  */
 
 #ifndef ARCH_NR_GPIOS
-#define ARCH_NR_GPIOS  512
-#endif
-
-#ifndef ARCH_GPIOS_PER_CHIP
-#define ARCH_GPIOS_PER_CHIPBITS_PER_LONG
+#define ARCH_NR_GPIOS  256
 #endif
 
 struct seq_file;
@@ -36,21 +31,18 @@ struct seq_file;
  * state (such as pullup/pulldown configuration).
  * @base: identifies the first GPIO number handled by this chip; or, if
  * negative during registration, requests dynamic ID allocation.
- * @ngpio: the number of GPIOs handled by this controller; the value must
- * be at most ARCH_GPIOS_PER_CHIP, so the last GPIO handled is
- * (base + ngpio - 1).
+ * @ngpio: the number of GPIOs handled by this controller; the last GPIO
+ * handled is (base + ngpio - 1).
  * @can_sleep: flag must be set iff get()/set() methods sleep, as they
  * must while accessing GPIO expander chips over I2C or SPI
- * @is_out: bit array where bit N is true iff GPIO with offset N has been
- *  called successfully to configure this as an output
  *
  * A gpio_chip can help platforms abstract various sources of GPIOs so
  * they can all be accessed through a common programing interface.
  * Example sources would be SOC controllers, FPGAs, multifunction
  * chips, dedicated GPIO expanders, and so on.
  *
- * Each chip controls a number of signals, numbered [EMAIL PROTECTED], which 
are
- * identified in method calls by an 

Re: Please revert: PCI: fix IDE legacy mode resources

2007-12-09 Thread Benjamin Herrenschmidt
powerpc: Fix IDE legacy vs. native fixups

PowerMac and CHRP/BriQ platforms have quirks to switch some IDE
controllers from legacy mode to fully native mode. Those quirks
however will not work properly anymore due to a change to the
generic code to better handle legacy IDE resources.

This fixes it by moving those quirk to "early" quirks (so they
run before resources are probed for the devices) and clearing
all BARs after the conversion to force a reallocation of sane
values.

Signed-off-by: Benjamin Herrenschmidt <[EMAIL PROTECTED]>
---

To be totally correct, we still need to also revert
commit fd6e732186ab522c812ab19c2c5e5befb8ec8115 which
is bogus.

Linus, can you still apply this to 2.6.24 ? I would also like the
above (fd6e...) reverted as so far, nobody have come up with a demonstration
that it's not bogus.

Index: linux-work/arch/powerpc/platforms/chrp/pci.c
===
--- linux-work.orig/arch/powerpc/platforms/chrp/pci.c   2007-12-10 
15:23:21.0 +1100
+++ linux-work/arch/powerpc/platforms/chrp/pci.c2007-12-10 
15:23:29.0 +1100
@@ -317,8 +317,12 @@ chrp_find_bridges(void)
 /* SL82C105 IDE Control/Status Register */
 #define SL82C105_IDECSR0x40
 
-/* Fixup for Winbond ATA quirk, required for briq */
-void chrp_pci_fixup_winbond_ata(struct pci_dev *sl82c105)
+/* Fixup for Winbond ATA quirk, required for briq mostly because the
+ * 8259 is configured for level sensitive IRQ 14 and so wants the
+ * ATA controller to be set to fully native mode or bad things
+ * will happen.
+ */
+static void __devinit chrp_pci_fixup_winbond_ata(struct pci_dev *sl82c105)
 {
u8 progif;
 
@@ -334,10 +338,15 @@ void chrp_pci_fixup_winbond_ata(struct p
sl82c105->class |= 0x05;
/* Disable SL82C105 second port */
pci_write_config_word(sl82c105, SL82C105_IDECSR, 0x0003);
+   /* Clear IO BARs, they will be reassigned */
+   pci_write_config_dword(sl82c105, PCI_BASE_ADDRESS_0, 0);
+   pci_write_config_dword(sl82c105, PCI_BASE_ADDRESS_1, 0);
+   pci_write_config_dword(sl82c105, PCI_BASE_ADDRESS_2, 0);
+   pci_write_config_dword(sl82c105, PCI_BASE_ADDRESS_3, 0);
}
 }
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_WINBOND, PCI_DEVICE_ID_WINBOND_82C105,
-   chrp_pci_fixup_winbond_ata);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_WINBOND, PCI_DEVICE_ID_WINBOND_82C105,
+   chrp_pci_fixup_winbond_ata);
 
 /* Pegasos2 firmware version 20040810 configures the built-in IDE controller
  * in legacy mode, but sets the PCI registers to PCI native mode.
@@ -345,7 +354,7 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_WI
  * mode as well. The same fixup must be done to the class-code property in
  * the IDE node /[EMAIL PROTECTED]/[EMAIL PROTECTED],1
  */
-static void chrp_pci_fixup_vt8231_ata(struct pci_dev *viaide)
+static void __devinit chrp_pci_fixup_vt8231_ata(struct pci_dev *viaide)
 {
u8 progif;
struct pci_dev *viaisa;
@@ -366,4 +375,4 @@ static void chrp_pci_fixup_vt8231_ata(st
 
pci_dev_put(viaisa);
 }
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C586_1, 
chrp_pci_fixup_vt8231_ata);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C586_1, 
chrp_pci_fixup_vt8231_ata);
Index: linux-work/arch/powerpc/platforms/powermac/pci.c
===
--- linux-work.orig/arch/powerpc/platforms/powermac/pci.c   2007-12-10 
15:23:21.0 +1100
+++ linux-work/arch/powerpc/platforms/powermac/pci.c2007-12-10 
15:23:29.0 +1100
@@ -1243,15 +1243,22 @@ void pmac_pci_fixup_pciata(struct pci_de
  good:
pci_read_config_byte(dev, PCI_CLASS_PROG, );
if ((progif & 5) != 5) {
-   printk(KERN_INFO "Forcing PCI IDE into native mode: %s\n",
+   printk(KERN_INFO "PCI: %s Forcing PCI IDE into native mode\n",
   pci_name(dev));
(void) pci_write_config_byte(dev, PCI_CLASS_PROG, progif|5);
if (pci_read_config_byte(dev, PCI_CLASS_PROG, ) ||
(progif & 5) != 5)
printk(KERN_ERR "Rewrite of PROGIF failed !\n");
+   else {
+   /* Clear IO BARs, they will be reassigned */
+   pci_write_config_dword(dev, PCI_BASE_ADDRESS_0, 0);
+   pci_write_config_dword(dev, PCI_BASE_ADDRESS_1, 0);
+   pci_write_config_dword(dev, PCI_BASE_ADDRESS_2, 0);
+   pci_write_config_dword(dev, PCI_BASE_ADDRESS_3, 0);
+   }
}
 }
-DECLARE_PCI_FIXUP_FINAL(PCI_ANY_ID, PCI_ANY_ID, pmac_pci_fixup_pciata);
+DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, pmac_pci_fixup_pciata);
 #endif
 
 /*


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a 

Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops

2007-12-09 Thread Rene Herman

On 09-12-07 22:25, Pavel Machek wrote:


On Sun 2007-12-09 17:59:08, Andi Kleen wrote:



Yes, i guess switching to udelay at least on newer systems would
be a good idea.  I'm not quite sure about systems without TSC though.


Something like this? (Warning, will not probably even compile on
x86-64, I do not have 64-bit compiler near me).



 static inline void native_io_delay(void)
 {
-   asm volatile("outb %%al,$0x80" : : : "memory");
+   udelay(8);
 }


Alan, did you double-check that 8 us? I tried to but I seem to not have 
trustworthy documentation.


Rene.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH acs_ame scsi driver 000 of 1] Introduction

2007-12-09 Thread chang jeff
Dear Sir,

Following is patche for scsi driver in 2.6.23.9 what should get into
2.6.23.9-final if possible.

First refines a newly created scsi driver
(/[kernel-version]/driver/scsi/acs_ame).

Thanks,
JeffChang.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lm-sensors] 2.6.24-rc4 hwmon it87 probe fails

2007-12-09 Thread Mike Houston
On Mon, 10 Dec 2007 10:31:27 +0800
Shaohua Li <[EMAIL PROTECTED]> wrote:
> This should exist in previous kernel (before we remove acpi
> motherboard driver) too. Basically it's a broken BIOS. Could below
> patch work around it?
> 
> Thanks,
> Shaohua
> 
> Index: linux/drivers/pnp/system.c
> ===
> --- linux.orig/drivers/pnp/system.c   2007-12-10
> 10:17:46.0 +0800 +++ linux/drivers/pnp/system.c

Thanks Shaohua, I tested this as well and it appears to have worked
around the issue for me.

Now, in dmesg, I get:

system 00:01: ioport range 0x290-0x29f has been reserved
(...)
system 00:01: ioport range 0x290-0x294 could not be reserved

In /proc/ioports I see:

0290-029f : pnp 00:01
  0290-0297 : it87
0290-0297 : it87

The it87 sensor now works without disabling acpipnp

Mike Houston
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23

2007-12-09 Thread Alan Cox
> Have you even *read* the thread?

In detail, as it unfolds and while testing variants of Tejun's code on
the hardware I have access to - none of which has this bug making it
rather trickier to help.

> In other words, the stuff you call so critically important (yet we've been 
> able to live without it until now!) is apparently simply NOT YET READY. 
> It's breaking things.

And as I keep pointing out but you keep ignoring - not doing it breaks
even more things, by a factor of quite a lot.

> .. and what the hell does that matter? If the code doesn't work, it 
> doesn't work, and you might as well point to some random scribblings done 
> by a three-year-old on toilet paper rather than any "specs".

The code without the changes doesn't work either. So pick your toilet
paper.. by your argument both are toilet paper.

> causes regressions should be reverted, so that 2.6.24 is at least no worse 
> than 2.6.23 (and all earlier kernels) in this respect.

Which as the distro bug lists for ATAPI will tell you - aint good. Still
distro vendors can ship patches.

> We used to allow regressions. It was really painful. It's hard to debug 
> things when things sometimes break. It's much better to have a nice 
> constant monotonic improvement.

Linus, the kernel regresses all over the place every release. If it
didn't do that you'd never get any changes in. Your kernel would
fossilize like RHEL or SLES and you'd be spending weeks analysing each
changeset for possible side effects, or - as happens by neccessity -
adding code paths so a fix vital to one driver ceases to share core code
with another driver - to reduce regression risk. Been there, done that
and its not the way progress happens.

> It's better for users, but it's much better also for developers, even if 
> you may be frustrated right now because some new code effectively gets 
> shut down until it works for everybody.

Have fun. I trust you'll be fixing the other 11 I think it was listed
regressions before 2.6.24  - or backing out every changeset that could be
responsible ?

No I thought not - because that wouldn't be sensible either.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [UNIONFS] 00/42 Unionfs and related patches review

2007-12-09 Thread hooanon05

Erez Zadok:
> (1) Cache coherency: by far, the biggest concern had been around cache
:::
> unionfs.  The solution we have implemented is to compare the mtime/ctime of
> upper/lower objects during revalidation (esp. of dentries); and if the lower
> times are newer, we reconstruct the union object (drop the older objects,
> and re-lookup them).  This time-based cache-coherency works well and is
:::

The resolution of mtime/ctime may be too low since some filesystems sets
them in unit of a second, which means you cannot detect the changes made
within a second.
I think it is better to use inotify for every directory while it
consumes a little more resources.
Additionally, if you implement vm_operations instead of
struggling along address_space_operations or VFS patches, in order to
share the mmap-ed memory pages between lower inode and unionfs inode,
then most of issues will be gone.
You can see this approach and how it is working in http://aufs.sf.net
(and get the source file from CVS).

But I am afraid the approach sharing memory pages will not be avaiable
for ecryptfs.


Junjiro Okajima
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23

2007-12-09 Thread Alan Cox
Its your kernel. Its your call, and your privilege to be wrong.

And anyone with ATAPI problems should probably test the -mm tree before
reporting anything.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] sched: Fixed missed rt-balance points on priority shifts

2007-12-09 Thread Gregory Haskins
>>> On Sun, Dec 9, 2007 at  9:53 PM, in message
<[EMAIL PROTECTED]>, Gregory Haskins
<[EMAIL PROTECTED]> wrote: 

> +  * I have no doubt that this is the proper thing to do to make
> +  * sure RT tasks are properly balanced.  What I cannot wrap my
> +  * head around at this late hour is if issuing a reschedule()
> +  * here may cause issues in other circumstances.  TBD
> +  */
> + if (!task_running(rq, p))
> + resched_task(rq->curr);
> + }

It dawned on me after I sent this that a further optimization here is to 
predicate the reschedule on whether we are overloaded.  In otherwords:

if (!task_running(rq, p) && rt_overloaded(rq))

Regards,
-Greg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.23.9] hostap_plx locks up PC when reading PCI I/O memory

2007-12-09 Thread Chris Rankin
--- Arjan van de Ven <[EMAIL PROTECTED]> wrote:
> the memory you feed to readl() and co isnt the actual PCI resource;
> you need to use ioremap() on the PCI resource to get a pointer that you can 
> then feed to
> readl()

I gathered that much, and there is indeed a call to ioremap() in the code. So 
are you suggesting
that  I try replacing that ioremap() call with ioremap_nocache()?

Cheers,
Chris


  ___
Support the World Aids Awareness campaign this month with Yahoo! For Good 
http://uk.promotions.yahoo.com/forgood/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23

2007-12-09 Thread Robert Hancock

Tejun Heo wrote:

Robert Hancock wrote:

And you're quite right in your comment that we are often too quick to
blacklist hardware instead of looking into why it really is failing.
ACPI is one of those areas where we often just need to figure out how to
be bug-to-bug compatibile with what Windows is doing..


In the spirit of not blacklisting without looking deep into ACPI code,
can somebody familiar with ASL take a look at comment 11 of bug 9320?

  http://bugzilla.kernel.org/show_bug.cgi?id=9320#c11

This is libata calling _GTM to find out how the BIOS configured the
device to determine cable type.

Thanks.


I suspect it's somewhat similar (though perhaps a different cause), the 
code is trying to lookup a value (presumably register contents) in a 
table using Match, gets a value that's not in the table (which makes 
Match return the ONES value  meaning not found) and so the 
lookup of the corresponding output value with that index fails. We'd 
need the full ASL dump to know exactly what's going on there.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lm-sensors] 2.6.24-rc4 hwmon it87 probe fails

2007-12-09 Thread Elvis Pranskevichus
On Sunday December 9 2007 09:31:27 pm Shaohua Li wrote:
> On Sun, 2007-12-09 at 23:04 +0100, Adrian Bunk wrote:
> > On Sun, Dec 09, 2007 at 04:12:25PM -0500, Elvis Pranskevichus wrote:
> > > Jean Delvare wrote:
> > > > Hi Mike,
> > > >
> > > > On Sat, 8 Dec 2007 21:22:34 -0500, Mike Houston wrote:
> > > >> On Sun, 9 Dec 2007 01:05:54 +0100
> > > >>
> > > >> Adrian Bunk <[EMAIL PROTECTED]> wrote:
> > > >> > On Tue, Dec 04, 2007 at 09:51:54PM -0500, Mike Houston wrote:
> > > >> > > I finally got around to testing Linux 2.6.24 (2.6.24-rc4) and
> > > >> > > found that the it87 driver fails to probe and consequently, my
> > > >> > > sensors no longer work. This was fine with Linux 2.6.23.8 (the
> > > >> > > last kernel I was using)
> > > >> > >
> > > >> > > The necessary modules load, but:
> > > >> > >
> > > >> > > it87: Found IT8718F chip at 0x290, revision 2
> > > >> > > it87: in3 is VCC (+5V)
> > > >> > > it87 it87.656: Failed to request region 0x290-0x297
> > > >> > > it87: probe of it87.656 failed with error -16
> > > >> > >
> > > >> > > Coretemp still works.
> > > >> > >
> > > >> > > It appears it has something to do with the ioport range being
> > > >> > > reserved for some reason:
> > > >> > >
> > > >> > > system 00:01: ioport range 0x290-0x29f has been reserved
> > > >> >
> > > >> > Thanks for your report.
> > > >> >
> > > >> > Please also provide:
> > > >> > - dmesg from 2.6.23.8
> > > >> > - The output of "cat /proc/ioports" for both kernels
> > > >>
> > > >> Thanks Adrian, here is the information you have requested, for
> > > >> both kernels (I have 2.6.23.9 now though where it87 still works)
> > > >>
> > > >> Linux 2.6.23.9:
> > > >> http://www.mikeserv.com/temp/proc_ioports-2.6.23.9.txt
> > > >> http://www.mikeserv.com/temp/dmesg-2.6.23.9.txt
> > > >> http://www.mikeserv.com/temp/config-2.6.23.9.txt
> > > >>
> > > >> Linux 2.6.24-rc4:
> > > >> http://www.mikeserv.com/temp/proc_ioports-2.6.24-rc4.txt
> > > >> http://www.mikeserv.com/temp/dmesg-2.6.24-rc4.txt
> > > >
> > > > This one shows:
> > > >
> > > > system 00:01: ioport range 0x290-0x29f has been reserved
> > > > (...)
> > > > system 00:01: ioport range 0x290-0x294 has been reserved
> > > >
> > > > This is clearly not correct as both areas overlap. The second
> > > > reservation is responsible for the it87 breakage, because it
> > > > conflicts with what the it87 driver later attempts to request
> > > > (0x290-0x297). The first is wrong as well (the IT87xxF environment
> > > > controller I/O area is 8 port wide, not 16) but shouldn't be a
> > > > problem in practice.
> > > >
> > > > These port reservations weren't happening in 2.6.23.9 according to
> > > > your dmesg output for that kernel. I don't know what changed in this
> > > > area since 2.6.23.9, maybe Bjorn or Adam (Cc'd) can tell.
> > >
> > > Hi,
> > >
> > > I have exactly the same problem here on a Gigabyte GA-965G-DS3
> > > motherboard based box:
> > >
> > > it87: Found IT8718F chip at 0x290, revision 1
> > > it87: in3 is VCC (+5V)
> > > it87 it87.656: Failed to request region 0x290-0x297
> > > it87: probe of it87.656 failed with error -16
> > >
> > > git bisecting revealed the offending commit:
> > >
> > > a7839e960675b54: PNP: increase the maximum number of resources
> > >
> > > Happened between rc3 and rc4.
> >
> > Thanks for doing the work of bisecting!
> >
> > > > Either way, the overlapping areas smell like a BIOS bug, meaning that
> > > > you should look for an updated BIOS for your system first.
> > > >
> > > >> http://www.mikeserv.com/temp/config-2.6.24-rc4.txt
> > >
> > > This indeed looks like a broken ACPI BIOS since the aforementioned
> > > commit touches only the PNP ACPI driver. I'm not sure how to work
> > > around this, though. Ideas?
> >
> > People responsible for this commit + ACPI maintainer added to Cc.
>
> This should exist in previous kernel (before we remove acpi motherboard
> driver) too. Basically it's a broken BIOS. Could below patch work around
> it?
>
> Thanks,
> Shaohua
>
> Index: linux/drivers/pnp/system.c
> ===
> --- linux.orig/drivers/pnp/system.c   2007-12-10 10:17:46.0 +0800
> +++ linux/drivers/pnp/system.c2007-12-10 10:24:42.0 +0800
> @@ -22,7 +22,7 @@ static const struct pnp_device_id pnp_de
>   {"", 0}
>  };
>
> -static void reserve_range(struct pnp_dev *dev, resource_size_t start,
> +static struct resource* reserve_range(struct pnp_dev *dev, resource_size_t
> start, resource_size_t end, int port)
>  {
>   char *regionid;
> @@ -31,16 +31,14 @@ static void reserve_range(struct pnp_dev
>
>   regionid = kmalloc(16, GFP_KERNEL);
>   if (!regionid)
> - return;
> + return NULL;
>
>   snprintf(regionid, 16, "pnp %s", pnpid);
>   if (port)
>   res = request_region(start, end - start + 1, regionid);
>   else
>   res = request_mem_region(start, end - start + 1, regionid);
> - if (res)
> -  

Re: [PATCH 2.6.24-rc4] proc: Remove/Fix proc generic d_revalidate

2007-12-09 Thread Petr Vandrovec

Eric W. Biederman wrote:

Ultimately to implement /proc perfectly we need an implementation
of d_revalidate because files and directories can be removed behind
the back of the VFS, and d_revalidate is the only way we can let
the VFS know that this has happened.

So until we get a proper test for keeping dentries in the dcache
fix the current d_revalidate method by completely removing it.  This
returns us to the current status quo.


Hello,
   I know that I'm late to the party, but mount points is not only 
problem with d_revalidate.  With your patch in place module below gets 
refcount incremented by two every time I do 'ls -la /proc/fs/vmblock'.



#include 
#include 
#include 

static int vmblockinit(void) {
   struct proc_dir_entry *controlProcDirEntry;

   /* Create /proc/fs/vmblock */
   controlProcDirEntry = proc_mkdir("vmblock", proc_root_fs);
   if (!controlProcDirEntry) {
  printk(KERN_DEBUG "Bad...\n");
  return -EINVAL;
   }
   controlProcDirEntry->owner = THIS_MODULE;
   return 0;
}

static void vmblockexit(void) {
   remove_proc_entry("vmblock", proc_root_fs);
}

module_init(vmblockinit);
module_exit(vmblockexit);


(code comes from VMware's vmblock module, 
http://sourceforge.net/project/showfiles.php?group_id=204462)

Thanks,
Petr

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC] sched: Fixed missed rt-balance points on priority shifts

2007-12-09 Thread Gregory Haskins
Hi Ingo, Steven, Dmitry,
   Here is a proposed fix for the issue that Dmitry brought up today.  It
   should apply cleanly to sched-devel (though I have a few of my other
   submitted fixes queued ahead of this that are not yet in sched-devel...so if
   you have a problem let me know and I will rebase/resubmit)

Regards,
-Greg

--
sched: Fixed missed rt-balance points on priority shifts

Dmitry Adamushko identified several holes in the rt-migration stategy relating
to changing priority via sched_setscheduler or rt_mutex_setprio:

http://lkml.org/lkml/2007/12/9/94

This patch should button up those conditions.

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
CC: Dmitry Adamushko <[EMAIL PROTECTED]>
---

 kernel/sched.c|8 
 kernel/sched_rt.c |   46 +-
 2 files changed, 53 insertions(+), 1 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 02f04bc..fd08ac2 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -348,6 +348,7 @@ struct rt_rq {
/* highest queued rt task prio */
int highest_prio;
int overloaded;
+   int needs_pull;
 };
 
 #ifdef CONFIG_SMP
@@ -4037,6 +4038,9 @@ void rt_mutex_setprio(struct task_struct *p, int prio)
check_preempt_curr(rq, p);
}
}
+
+   wakeup_balance_rt(rq, p);
+
task_rq_unlock(rq, );
 }
 
@@ -4341,6 +4345,9 @@ recheck:
check_preempt_curr(rq, p);
}
}
+
+   wakeup_balance_rt(rq, p);
+
__task_rq_unlock(rq);
spin_unlock_irqrestore(>pi_lock, flags);
 
@@ -6887,6 +6894,7 @@ void __init sched_init(void)
INIT_LIST_HEAD(>migration_queue);
rq->rt.highest_prio = MAX_RT_PRIO;
rq->rt.overloaded = 0;
+   rq->rt.needs_pull = 0;
 #endif
atomic_set(>nr_iowait, 0);
 
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index 65cbb78..1257575 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -84,6 +84,8 @@ static inline void inc_rt_tasks(struct task_struct *p, struct 
rq *rq)
 
 static inline void dec_rt_tasks(struct task_struct *p, struct rq *rq)
 {
+   int highest_prio = rq->rt.highest_prio;
+
WARN_ON(!rt_task(p));
WARN_ON(!rq->rt.rt_nr_running);
rq->rt.rt_nr_running--;
@@ -103,6 +105,42 @@ static inline void dec_rt_tasks(struct task_struct *p, 
struct rq *rq)
if (p->nr_cpus_allowed > 1)
rq->rt.rt_nr_migratory--;
 
+   if (rq->rt.highest_prio > highest_prio) {
+   /*
+* If the departing task is reducing our priority, we need to
+* check if we should pull tasks because its always possible
+* that another RQ tried to push tasks away but skipped us due
+* to elevated priority.  That elevated priority is now
+* subsiding so there may be tasks that are newly eligible for
+* migration.  This pull operation is currently facilitated
+* via schedule().
+*/
+   rq->rt.needs_pull = 1;
+
+   /*
+* FIXME: I am not sure about this next part:
+*
+* If the departing task is already running, we dont need to be
+* specific about rescheduling because presumably it will
+* happen momentarily anyway.  However, if the departing task
+* was *not* the current task (#), we should invoke a
+* reschedule to make sure we have the optimal task running.
+*
+* (#) It may seem like a pathological condition to have the
+* highest priority task not also be the current task.  However
+* consider the condition where this highest task was enqueued
+* and subsequently dequeued before the RQ ever had a chance to
+* reschedule.
+*
+* I have no doubt that this is the proper thing to do to make
+* sure RT tasks are properly balanced.  What I cannot wrap my
+* head around at this late hour is if issuing a reschedule()
+* here may cause issues in other circumstances.  TBD
+*/
+   if (!task_running(rq, p))
+   resched_task(rq->curr);
+   }
+
update_rt_migration(rq);
 #endif /* CONFIG_SMP */
 }
@@ -662,8 +700,14 @@ static int pull_rt_task(struct rq *this_rq)
 static void schedule_balance_rt(struct rq *rq, struct task_struct *prev)
 {
/* Try to pull RT tasks here if we lower this rq's prio */
-   if (unlikely(rt_task(prev)) && rq->rt.highest_prio > prev->prio)
+   if (unlikely(rq->rt.needs_pull)) {
+   /*
+* Clear the flag first, since pulling may release the lock
+* and someone else may re-set 

Re: [PATCH] iwlwifi3945/4965 - fix rate control algo reference leak

2007-12-09 Thread Zhu Yi

On Sat, 2007-12-08 at 09:56 -0500, Mark Lord wrote:
> 
> Any chance of getting LEDs support re-added to this driver,
> perhaps in the 2.6.25 timeframe?

I'd also like to see it happen. Stay tuned.

Thanks,
-yi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 40/42] eCryptfs: use simplified fs_stack API for dentry operations

2007-12-09 Thread Erez Zadok
CC: Mike Halcrow <[EMAIL PROTECTED]>

Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/ecryptfs/dentry.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/ecryptfs/dentry.c b/fs/ecryptfs/dentry.c
index cb20b96..a8c1686 100644
--- a/fs/ecryptfs/dentry.c
+++ b/fs/ecryptfs/dentry.c
@@ -62,7 +62,7 @@ static int ecryptfs_d_revalidate(struct dentry *dentry, 
struct nameidata *nd)
struct inode *lower_inode =
ecryptfs_inode_to_lower(dentry->d_inode);
 
-   fsstack_copy_attr_all(dentry->d_inode, lower_inode, NULL);
+   fsstack_copy_attr_all(dentry->d_inode, lower_inode);
}
 out:
return rc;
-- 
1.5.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: /proc/cpuinfo reports erroneous CPU frequency. (2.6.23 & 2.6.22)

2007-12-09 Thread Frans Pop
Alexander Rajula wrote:
> While overclocking an AMD Athlon X2 (2GHz) CPU /proc/cpuinfo reports the
> wrong CPU frequency. I am quite puzzled by this.
> Is this an error in the kernel, or is there something strange going on?

You may want to read some old threads on this list and check if that matches
what you are seeing:
http://lkml.org/lkml/2006/3/29/150
http://lkml.org/lkml/2006/10/26/182

Cheers,
FJP
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 20/42] Unionfs: readdir state helpers

2007-12-09 Thread Erez Zadok
Includes duplicate name elimination and whiteout-handling code.

Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/rdstate.c |  285 ++
 1 files changed, 285 insertions(+), 0 deletions(-)
 create mode 100644 fs/unionfs/rdstate.c

diff --git a/fs/unionfs/rdstate.c b/fs/unionfs/rdstate.c
new file mode 100644
index 000..7ba1e1a
--- /dev/null
+++ b/fs/unionfs/rdstate.c
@@ -0,0 +1,285 @@
+/*
+ * Copyright (c) 2003-2007 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2007 Stony Brook University
+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+/* This file contains the routines for maintaining readdir state. */
+
+/*
+ * There are two structures here, rdstate which is a hash table
+ * of the second structure which is a filldir_node.
+ */
+
+/*
+ * This is a struct kmem_cache for filldir nodes, because we allocate a lot
+ * of them and they shouldn't waste memory.  If the node has a small name
+ * (as defined by the dentry structure), then we use an inline name to
+ * preserve kmalloc space.
+ */
+static struct kmem_cache *unionfs_filldir_cachep;
+
+int unionfs_init_filldir_cache(void)
+{
+   unionfs_filldir_cachep =
+   kmem_cache_create("unionfs_filldir",
+ sizeof(struct filldir_node), 0,
+ SLAB_RECLAIM_ACCOUNT, NULL);
+
+   return (unionfs_filldir_cachep ? 0 : -ENOMEM);
+}
+
+void unionfs_destroy_filldir_cache(void)
+{
+   if (unionfs_filldir_cachep)
+   kmem_cache_destroy(unionfs_filldir_cachep);
+}
+
+/*
+ * This is a tuning parameter that tells us roughly how big to make the
+ * hash table in directory entries per page.  This isn't perfect, but
+ * at least we get a hash table size that shouldn't be too overloaded.
+ * The following averages are based on my home directory.
+ * 14.44693Overall
+ * 12.29   Single Page Directories
+ * 117.93  Multi-page directories
+ */
+#define DENTPAGE 4096
+#define DENTPERONEPAGE 12
+#define DENTPERPAGE 118
+#define MINHASHSIZE 1
+static int guesstimate_hash_size(struct inode *inode)
+{
+   struct inode *lower_inode;
+   int bindex;
+   int hashsize = MINHASHSIZE;
+
+   if (UNIONFS_I(inode)->hashsize > 0)
+   return UNIONFS_I(inode)->hashsize;
+
+   for (bindex = ibstart(inode); bindex <= ibend(inode); bindex++) {
+   lower_inode = unionfs_lower_inode_idx(inode, bindex);
+   if (!lower_inode)
+   continue;
+
+   if (i_size_read(lower_inode) == DENTPAGE)
+   hashsize += DENTPERONEPAGE;
+   else
+   hashsize += (i_size_read(lower_inode) / DENTPAGE) *
+   DENTPERPAGE;
+   }
+
+   return hashsize;
+}
+
+int init_rdstate(struct file *file)
+{
+   BUG_ON(sizeof(loff_t) !=
+  (sizeof(unsigned int) + sizeof(unsigned int)));
+   BUG_ON(UNIONFS_F(file)->rdstate != NULL);
+
+   UNIONFS_F(file)->rdstate = alloc_rdstate(file->f_path.dentry->d_inode,
+fbstart(file));
+
+   return (UNIONFS_F(file)->rdstate ? 0 : -ENOMEM);
+}
+
+struct unionfs_dir_state *find_rdstate(struct inode *inode, loff_t fpos)
+{
+   struct unionfs_dir_state *rdstate = NULL;
+   struct list_head *pos;
+
+   spin_lock(_I(inode)->rdlock);
+   list_for_each(pos, _I(inode)->readdircache) {
+   struct unionfs_dir_state *r =
+   list_entry(pos, struct unionfs_dir_state, cache);
+   if (fpos == rdstate2offset(r)) {
+   UNIONFS_I(inode)->rdcount--;
+   list_del(>cache);
+   rdstate = r;
+   break;
+   }
+   }
+   spin_unlock(_I(inode)->rdlock);
+   return rdstate;
+}
+
+struct unionfs_dir_state *alloc_rdstate(struct inode *inode, int bindex)
+{
+   int i = 0;
+   int hashsize;
+   unsigned long mallocsize = sizeof(struct unionfs_dir_state);
+   struct unionfs_dir_state *rdstate;
+
+   hashsize = guesstimate_hash_size(inode);
+   mallocsize += hashsize * sizeof(struct list_head);
+   mallocsize = __roundup_pow_of_two(mallocsize);
+
+   /* This should give us about 500 entries anyway. */
+   if (mallocsize > PAGE_SIZE)
+

[PATCH 06/42] Unionfs: documentation about renaming operations

2007-12-09 Thread Erez Zadok
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 Documentation/filesystems/unionfs/rename.txt |   31 ++
 1 files changed, 31 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/filesystems/unionfs/rename.txt

diff --git a/Documentation/filesystems/unionfs/rename.txt 
b/Documentation/filesystems/unionfs/rename.txt
new file mode 100644
index 000..e20bb82
--- /dev/null
+++ b/Documentation/filesystems/unionfs/rename.txt
@@ -0,0 +1,31 @@
+Rename is a complex beast. The following table shows which rename(2) operations
+should succeed and which should fail.
+
+o: success
+E: error (either unionfs or vfs)
+X: EXDEV
+
+none = file does not exist
+file = file is a file
+dir  = file is a empty directory
+child= file is a non-empty directory
+wh   = file is a directory containing only whiteouts; this makes it logically
+   empty
+
+  nonefiledir child   wh
+file  o   o   E   E   E
+dir   o   E   o   E   o
+child X   E   X   E   X
+who   E   o   E   o
+
+
+Renaming directories:
+=
+
+Whenever a empty (either physically or logically) directory is being renamed,
+the following sequence of events should take place:
+
+1) Remove whiteouts from both source and destination directory
+2) Rename source to destination
+3) Make destination opaque to prevent anything under it from showing up
+
-- 
1.5.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 12/42] Unionfs: common file copyup/revalidation operations

2007-12-09 Thread Erez Zadok
Includes open, ioctl, and flush operations.

Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/commonfops.c |  827 +++
 1 files changed, 827 insertions(+), 0 deletions(-)
 create mode 100644 fs/unionfs/commonfops.c

diff --git a/fs/unionfs/commonfops.c b/fs/unionfs/commonfops.c
new file mode 100644
index 000..f714e2f
--- /dev/null
+++ b/fs/unionfs/commonfops.c
@@ -0,0 +1,827 @@
+/*
+ * Copyright (c) 2003-2007 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2007 Stony Brook University
+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+/*
+ * 1) Copyup the file
+ * 2) Rename the file to '.unionfs' - obviously
+ * stolen from NFS's silly rename
+ */
+static int copyup_deleted_file(struct file *file, struct dentry *dentry,
+  int bstart, int bindex)
+{
+   static unsigned int counter;
+   const int i_inosize = sizeof(dentry->d_inode->i_ino) * 2;
+   const int countersize = sizeof(counter) * 2;
+   const int nlen = sizeof(".unionfs") + i_inosize + countersize - 1;
+   char name[nlen + 1];
+   int err;
+   struct dentry *tmp_dentry = NULL;
+   struct dentry *lower_dentry;
+   struct dentry *lower_dir_dentry = NULL;
+
+   lower_dentry = unionfs_lower_dentry_idx(dentry, bstart);
+
+   sprintf(name, ".unionfs%*.*lx",
+   i_inosize, i_inosize, lower_dentry->d_inode->i_ino);
+
+   /*
+* Loop, looking for an unused temp name to copyup to.
+*
+* It's somewhat silly that we look for a free temp tmp name in the
+* source branch (bstart) instead of the dest branch (bindex), where
+* the final name will be created.  We _will_ catch it if somehow
+* the name exists in the dest branch, but it'd be nice to catch it
+* sooner than later.
+*/
+retry:
+   tmp_dentry = NULL;
+   do {
+   char *suffix = name + nlen - countersize;
+
+   dput(tmp_dentry);
+   counter++;
+   sprintf(suffix, "%*.*x", countersize, countersize, counter);
+
+   pr_debug("unionfs: trying to rename %s to %s\n",
+dentry->d_name.name, name);
+
+   tmp_dentry = lookup_one_len(name, lower_dentry->d_parent,
+   nlen);
+   if (IS_ERR(tmp_dentry)) {
+   err = PTR_ERR(tmp_dentry);
+   goto out;
+   }
+   } while (tmp_dentry->d_inode != NULL);  /* need negative dentry */
+   dput(tmp_dentry);
+
+   err = copyup_named_file(dentry->d_parent->d_inode, file, name, bstart,
+   bindex,
+   i_size_read(file->f_path.dentry->d_inode));
+   if (err) {
+   if (unlikely(err == -EEXIST))
+   goto retry;
+   goto out;
+   }
+
+   /* bring it to the same state as an unlinked file */
+   lower_dentry = unionfs_lower_dentry_idx(dentry, dbstart(dentry));
+   if (!unionfs_lower_inode_idx(dentry->d_inode, bindex)) {
+   atomic_inc(_dentry->d_inode->i_count);
+   unionfs_set_lower_inode_idx(dentry->d_inode, bindex,
+   lower_dentry->d_inode);
+   }
+   lower_dir_dentry = lock_parent(lower_dentry);
+   err = vfs_unlink(lower_dir_dentry->d_inode, lower_dentry);
+   unlock_dir(lower_dir_dentry);
+
+out:
+   if (!err)
+   unionfs_check_dentry(dentry);
+   return err;
+}
+
+/*
+ * put all references held by upper struct file and free lower file pointer
+ * array
+ */
+static void cleanup_file(struct file *file)
+{
+   int bindex, bstart, bend;
+   struct file **lower_files;
+   struct file *lower_file;
+   struct super_block *sb = file->f_path.dentry->d_sb;
+
+   lower_files = UNIONFS_F(file)->lower_files;
+   bstart = fbstart(file);
+   bend = fbend(file);
+
+   for (bindex = bstart; bindex <= bend; bindex++) {
+   int i;  /* holds (possibly) updated branch index */
+   int old_bid;
+
+   lower_file = unionfs_lower_file_idx(file, bindex);
+   if (!lower_file)
+   continue;
+
+   /*
+* Find new index of matching branch with an open
+   

[PATCH 21/42] Unionfs: inode operations

2007-12-09 Thread Erez Zadok
Includes create, lookup, link, symlink, mkdir, mknod, readlink, follow_link,
put_link, permission, and setattr.

Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/inode.c | 1154 
 1 files changed, 1154 insertions(+), 0 deletions(-)
 create mode 100644 fs/unionfs/inode.c

diff --git a/fs/unionfs/inode.c b/fs/unionfs/inode.c
new file mode 100644
index 000..63ff3d3
--- /dev/null
+++ b/fs/unionfs/inode.c
@@ -0,0 +1,1154 @@
+/*
+ * Copyright (c) 2003-2007 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2007 Stony Brook University
+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+static int unionfs_create(struct inode *parent, struct dentry *dentry,
+ int mode, struct nameidata *nd)
+{
+   int err = 0;
+   struct dentry *lower_dentry = NULL;
+   struct dentry *wh_dentry = NULL;
+   struct dentry *lower_parent_dentry = NULL;
+   char *name = NULL;
+   int valid = 0;
+   struct nameidata lower_nd;
+
+   unionfs_read_lock(dentry->d_sb);
+   unionfs_lock_dentry(dentry);
+
+   unionfs_lock_dentry(dentry->d_parent);
+   valid = __unionfs_d_revalidate_chain(dentry->d_parent, nd, false);
+   unionfs_unlock_dentry(dentry->d_parent);
+   if (unlikely(!valid)) {
+   err = -ESTALE;  /* same as what real_lookup does */
+   goto out;
+   }
+   valid = __unionfs_d_revalidate_chain(dentry, nd, false);
+   /*
+* It's only a bug if this dentry was not negative and couldn't be
+* revalidated (shouldn't happen).
+*/
+   BUG_ON(!valid && dentry->d_inode);
+
+   /*
+* We shouldn't create things in a read-only branch; this check is a
+* bit redundant as we don't allow branch 0 to be read-only at the
+* moment
+*/
+   err = is_robranch_super(dentry->d_sb, 0);
+   if (err) {
+   err = -EROFS;
+   goto out;
+   }
+
+   /*
+* We _always_ create on branch 0
+*/
+   lower_dentry = unionfs_lower_dentry_idx(dentry, 0);
+   if (lower_dentry) {
+   /*
+* check if whiteout exists in this branch, i.e. lookup .wh.foo
+* first.
+*/
+   name = alloc_whname(dentry->d_name.name, dentry->d_name.len);
+   if (unlikely(IS_ERR(name))) {
+   err = PTR_ERR(name);
+   goto out;
+   }
+
+   wh_dentry = lookup_one_len(name, lower_dentry->d_parent,
+  dentry->d_name.len + UNIONFS_WHLEN);
+   if (IS_ERR(wh_dentry)) {
+   err = PTR_ERR(wh_dentry);
+   wh_dentry = NULL;
+   goto out;
+   }
+
+   if (wh_dentry->d_inode) {
+   /*
+* .wh.foo has been found, so let's unlink it
+*/
+   struct dentry *lower_dir_dentry;
+
+   lower_dir_dentry = lock_parent(wh_dentry);
+   err = vfs_unlink(lower_dir_dentry->d_inode, wh_dentry);
+   unlock_dir(lower_dir_dentry);
+
+   /*
+* Whiteouts are special files and should be deleted
+* no matter what (as if they never existed), in
+* order to allow this create operation to succeed.
+* This is especially important in sticky
+* directories: a whiteout may have been created by
+* one user, but the newly created file may be
+* created by another user.  Therefore, in order to
+* maintain Unix semantics, if the vfs_unlink above
+* ailed, then we have to try to directly unlink the
+* whiteout.  Note: in the ODF version of unionfs,
+* whiteout are handled much more cleanly.
+*/
+   if (err == -EPERM) {
+   struct inode *inode = lower_dir_dentry->d_inode;
+   err = inode->i_op->unlink(inode, wh_dentry);
+   }
+   if 

[PATCH 36/42] VFS: export drop_pagecache_sb

2007-12-09 Thread Erez Zadok
Needed to maintain cache coherency after branch management.

Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/drop_caches.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/fs/drop_caches.c b/fs/drop_caches.c
index 59375ef..90410ac 100644
--- a/fs/drop_caches.c
+++ b/fs/drop_caches.c
@@ -3,6 +3,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -12,7 +13,7 @@
 /* A global variable is a bit ugly, but it keeps the code simple */
 int sysctl_drop_caches;
 
-static void drop_pagecache_sb(struct super_block *sb)
+void drop_pagecache_sb(struct super_block *sb)
 {
struct inode *inode;
 
@@ -24,6 +25,7 @@ static void drop_pagecache_sb(struct super_block *sb)
}
spin_unlock(_lock);
 }
+EXPORT_SYMBOL(drop_pagecache_sb);
 
 void drop_pagecache(void)
 {
-- 
1.5.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 39/42] Put Unionfs and eCryptfs under one layered filesystems menu

2007-12-09 Thread Erez Zadok
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/Kconfig |   53 +
 1 files changed, 41 insertions(+), 12 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index 635f3e2..cbcbbee 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -1041,6 +1041,47 @@ config CONFIGFS_FS
 
 endmenu
 
+menu "Layered filesystems"
+
+config ECRYPT_FS
+   tristate "eCrypt filesystem layer support (EXPERIMENTAL)"
+   depends on EXPERIMENTAL && KEYS && CRYPTO && NET
+   help
+ Encrypted filesystem that operates on the VFS layer.  See
+  to learn more about
+ eCryptfs.  Userspace components are required and can be
+ obtained from .
+
+ To compile this file system support as a module, choose M here: the
+ module will be called ecryptfs.
+
+config UNION_FS
+   tristate "Union file system (EXPERIMENTAL)"
+   depends on EXPERIMENTAL
+   help
+ Unionfs is a stackable unification file system, which appears to
+ merge the contents of several directories (branches), while keeping
+ their physical content separate.
+
+ See  for details
+
+config UNION_FS_XATTR
+   bool "Unionfs extended attributes"
+   depends on UNION_FS
+   help
+ Extended attributes are name:value pairs associated with inodes by
+ the kernel or by users (see the attr(5) manual page).
+
+ If unsure, say N.
+
+config UNION_FS_DEBUG
+   bool "Debug Unionfs"
+   depends on UNION_FS
+   help
+ If you say Y here, you can turn on debugging output from Unionfs.
+
+endmenu
+
 menu "Miscellaneous filesystems"
 
 config ADFS_FS
@@ -1093,18 +1134,6 @@ config AFFS_FS
  To compile this file system support as a module, choose M here: the
  module will be called affs.  If unsure, say N.
 
-config ECRYPT_FS
-   tristate "eCrypt filesystem layer support (EXPERIMENTAL)"
-   depends on EXPERIMENTAL && KEYS && CRYPTO && NET
-   help
- Encrypted filesystem that operates on the VFS layer.  See
-  to learn more about
- eCryptfs.  Userspace components are required and can be
- obtained from .
-
- To compile this file system support as a module, choose M here: the
- module will be called ecryptfs.
-
 config HFS_FS
tristate "Apple Macintosh file system support (EXPERIMENTAL)"
depends on BLOCK && EXPERIMENTAL
-- 
1.5.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 13/42] Unionfs: basic file operations

2007-12-09 Thread Erez Zadok
Includes read, write, mmap, fsync, and fasync.

Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/file.c |  227 +
 1 files changed, 227 insertions(+), 0 deletions(-)
 create mode 100644 fs/unionfs/file.c

diff --git a/fs/unionfs/file.c b/fs/unionfs/file.c
new file mode 100644
index 000..c922173
--- /dev/null
+++ b/fs/unionfs/file.c
@@ -0,0 +1,227 @@
+/*
+ * Copyright (c) 2003-2007 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2007 Stony Brook University
+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+static ssize_t unionfs_read(struct file *file, char __user *buf,
+   size_t count, loff_t *ppos)
+{
+   int err;
+
+   unionfs_read_lock(file->f_path.dentry->d_sb);
+   err = unionfs_file_revalidate(file, false);
+   if (unlikely(err))
+   goto out;
+   unionfs_check_file(file);
+
+   err = do_sync_read(file, buf, count, ppos);
+
+out:
+   unionfs_check_file(file);
+   unionfs_read_unlock(file->f_path.dentry->d_sb);
+   return err;
+}
+
+static ssize_t unionfs_write(struct file *file, const char __user *buf,
+size_t count, loff_t *ppos)
+{
+   int err = 0;
+
+   unionfs_read_lock(file->f_path.dentry->d_sb);
+   err = unionfs_file_revalidate(file, true);
+   if (unlikely(err))
+   goto out;
+   unionfs_check_file(file);
+
+   err = do_sync_write(file, buf, count, ppos);
+   /* update our inode times upon a successful lower write */
+   if (err >= 0) {
+   unionfs_copy_attr_times(file->f_path.dentry->d_inode);
+   unionfs_check_file(file);
+   }
+
+out:
+   unionfs_read_unlock(file->f_path.dentry->d_sb);
+   return err;
+}
+
+static int unionfs_file_readdir(struct file *file, void *dirent,
+   filldir_t filldir)
+{
+   return -ENOTDIR;
+}
+
+static int unionfs_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   int err = 0;
+   bool willwrite;
+   struct file *lower_file;
+
+   unionfs_read_lock(file->f_path.dentry->d_sb);
+
+   /* This might be deferred to mmap's writepage */
+   willwrite = ((vma->vm_flags | VM_SHARED | VM_WRITE) == vma->vm_flags);
+   err = unionfs_file_revalidate(file, willwrite);
+   if (unlikely(err))
+   goto out;
+   unionfs_check_file(file);
+
+   /*
+* File systems which do not implement ->writepage may use
+* generic_file_readonly_mmap as their ->mmap op.  If you call
+* generic_file_readonly_mmap with VM_WRITE, you'd get an -EINVAL.
+* But we cannot call the lower ->mmap op, so we can't tell that
+* writeable mappings won't work.  Therefore, our only choice is to
+* check if the lower file system supports the ->writepage, and if
+* not, return EINVAL (the same error that
+* generic_file_readonly_mmap returns in that case).
+*/
+   lower_file = unionfs_lower_file(file);
+   if (willwrite && !lower_file->f_mapping->a_ops->writepage) {
+   err = -EINVAL;
+   printk(KERN_ERR "unionfs: branch %d file system does not "
+  "support writeable mmap\n", fbstart(file));
+   } else {
+   err = generic_file_mmap(file, vma);
+   if (err)
+   printk(KERN_ERR
+  "unionfs: generic_file_mmap failed %d\n", err);
+   }
+
+out:
+   if (!err) {
+   /* copyup could cause parent dir times to change */
+   unionfs_copy_attr_times(file->f_path.dentry->d_parent->d_inode);
+   unionfs_check_file(file);
+   unionfs_check_dentry(file->f_path.dentry->d_parent);
+   }
+   unionfs_read_unlock(file->f_path.dentry->d_sb);
+   return err;
+}
+
+int unionfs_fsync(struct file *file, struct dentry *dentry, int datasync)
+{
+   int bindex, bstart, bend;
+   struct file *lower_file;
+   struct dentry *lower_dentry;
+   struct inode *lower_inode, *inode;
+   int err = -EINVAL;
+
+   unionfs_read_lock(file->f_path.dentry->d_sb);
+   err = unionfs_file_revalidate(file, true);
+   if (unlikely(err))
+   goto out;
+   unionfs_check_file(file);
+
+   bstart = fbstart(file);
+   bend = 

[PATCH 30/42] Unionfs: debugging infrastructure

2007-12-09 Thread Erez Zadok
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/debug.c |  532 
 1 files changed, 532 insertions(+), 0 deletions(-)
 create mode 100644 fs/unionfs/debug.c

diff --git a/fs/unionfs/debug.c b/fs/unionfs/debug.c
new file mode 100644
index 000..c2b8b58
--- /dev/null
+++ b/fs/unionfs/debug.c
@@ -0,0 +1,532 @@
+/*
+ * Copyright (c) 2003-2007 Erez Zadok
+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
+ * Copyright (c) 2003-2007 Stony Brook University
+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+/*
+ * Helper debugging functions for maintainers (and for users to report back
+ * useful information back to maintainers)
+ */
+
+/* it's always useful to know what part of the code called us */
+#define PRINT_CALLER(fname, fxn, line) \
+   do {\
+   if (!printed_caller) {  \
+   pr_debug("PC:%s:%s:%d\n", (fname), (fxn), (line)); \
+   printed_caller = 1; \
+   }   \
+   } while (0)
+
+/*
+ * __unionfs_check_{inode,dentry,file} perform exhaustive sanity checking on
+ * the fan-out of various Unionfs objects.  We check that no lower objects
+ * exist  outside the start/end branch range; that all objects within are
+ * non-NULL (with some allowed exceptions); that for every lower file
+ * there's a lower dentry+inode; that the start/end ranges match for all
+ * corresponding lower objects; that open files/symlinks have only one lower
+ * objects, but directories can have several; and more.
+ */
+void __unionfs_check_inode(const struct inode *inode,
+  const char *fname, const char *fxn, int line)
+{
+   int bindex;
+   int istart, iend;
+   struct inode *lower_inode;
+   struct super_block *sb;
+   int printed_caller = 0;
+   void *poison_ptr;
+
+   /* for inodes now */
+   BUG_ON(!inode);
+   sb = inode->i_sb;
+   istart = ibstart(inode);
+   iend = ibend(inode);
+   /* don't check inode if no lower branches */
+   if (istart < 0 && iend < 0)
+   return;
+   if (unlikely(istart > iend)) {
+   PRINT_CALLER(fname, fxn, line);
+   pr_debug(" Ci0: inode=%p istart/end=%d:%d\n",
+inode, istart, iend);
+   }
+   if (unlikely((istart == -1 && iend != -1) ||
+(istart != -1 && iend == -1))) {
+   PRINT_CALLER(fname, fxn, line);
+   pr_debug(" Ci1: inode=%p istart/end=%d:%d\n",
+inode, istart, iend);
+   }
+   if (!S_ISDIR(inode->i_mode)) {
+   if (unlikely(iend != istart)) {
+   PRINT_CALLER(fname, fxn, line);
+   pr_debug(" Ci2: inode=%p istart=%d iend=%d\n",
+inode, istart, iend);
+   }
+   }
+
+   for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) {
+   if (unlikely(!UNIONFS_I(inode))) {
+   PRINT_CALLER(fname, fxn, line);
+   pr_debug(" Ci3: no inode_info %p\n", inode);
+   return;
+   }
+   if (unlikely(!UNIONFS_I(inode)->lower_inodes)) {
+   PRINT_CALLER(fname, fxn, line);
+   pr_debug(" Ci4: no lower_inodes %p\n", inode);
+   return;
+   }
+   lower_inode = unionfs_lower_inode_idx(inode, bindex);
+   if (lower_inode) {
+   memset(_ptr, POISON_INUSE, sizeof(void *));
+   if (unlikely(bindex < istart || bindex > iend)) {
+   PRINT_CALLER(fname, fxn, line);
+   pr_debug(" Ci5: inode/linode=%p:%p bindex=%d "
+"istart/end=%d:%d\n", inode,
+lower_inode, bindex, istart, iend);
+   } else if (unlikely(lower_inode == poison_ptr)) {
+   /* freed inode! */
+   PRINT_CALLER(fname, fxn, line);
+   pr_debug(" Ci6: inode/linode=%p:%p bindex=%d "
+"istart/end=%d:%d\n", inode,
+lower_inode, bindex, istart, iend);
+   }
+   continue;
+   }
+   /* if we get here, then lower_inode == NULL */
+   if (bindex < istart || bindex > 

[PATCH 34/42] VFS path get/put ops used by Unionfs

2007-12-09 Thread Erez Zadok
Note: this will become obsolete once similar patches, now in -mm, make it to
mainline.

Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 include/linux/namei.h |   13 +
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/include/linux/namei.h b/include/linux/namei.h
index 4cb4f8d..63f16d9 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -3,6 +3,7 @@
 
 #include 
 #include 
+#include 
 
 struct vfsmount;
 
@@ -100,4 +101,16 @@ static inline char *nd_get_link(struct nameidata *nd)
return nd->saved_names[nd->depth];
 }
 
+static inline void pathget(struct path *path)
+{
+   mntget(path->mnt);
+   dget(path->dentry);
+}
+
+static inline void pathput(struct path *path)
+{
+   dput(path->dentry);
+   mntput(path->mnt);
+}
+
 #endif /* _LINUX_NAMEI_H */
-- 
1.5.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 16/42] Unionfs: lower-level lookup routines

2007-12-09 Thread Erez Zadok
Includes lower nameidata support routines.

Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/lookup.c |  652 +++
 1 files changed, 652 insertions(+), 0 deletions(-)
 create mode 100644 fs/unionfs/lookup.c

diff --git a/fs/unionfs/lookup.c b/fs/unionfs/lookup.c
new file mode 100644
index 000..a1904c9
--- /dev/null
+++ b/fs/unionfs/lookup.c
@@ -0,0 +1,652 @@
+/*
+ * Copyright (c) 2003-2007 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2007 Stony Brook University
+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+static int realloc_dentry_private_data(struct dentry *dentry);
+
+/* is the filename valid == !(whiteout for a file or opaque dir marker) */
+static int is_validname(const char *name)
+{
+   if (!strncmp(name, UNIONFS_WHPFX, UNIONFS_WHLEN))
+   return 0;
+   if (!strncmp(name, UNIONFS_DIR_OPAQUE_NAME,
+sizeof(UNIONFS_DIR_OPAQUE_NAME) - 1))
+   return 0;
+   return 1;
+}
+
+/* The rest of these are utility functions for lookup. */
+static noinline int is_opaque_dir(struct dentry *dentry, int bindex)
+{
+   int err = 0;
+   struct dentry *lower_dentry;
+   struct dentry *wh_lower_dentry;
+   struct inode *lower_inode;
+   struct sioq_args args;
+
+   lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
+   lower_inode = lower_dentry->d_inode;
+
+   BUG_ON(!S_ISDIR(lower_inode->i_mode));
+
+   mutex_lock(_inode->i_mutex);
+
+   if (!permission(lower_inode, MAY_EXEC, NULL)) {
+   wh_lower_dentry =
+   lookup_one_len(UNIONFS_DIR_OPAQUE, lower_dentry,
+  sizeof(UNIONFS_DIR_OPAQUE) - 1);
+   } else {
+   args.is_opaque.dentry = lower_dentry;
+   run_sioq(__is_opaque_dir, );
+   wh_lower_dentry = args.ret;
+   }
+
+   mutex_unlock(_inode->i_mutex);
+
+   if (IS_ERR(wh_lower_dentry)) {
+   err = PTR_ERR(wh_lower_dentry);
+   goto out;
+   }
+
+   /* This is an opaque dir iff wh_lower_dentry is positive */
+   err = !!wh_lower_dentry->d_inode;
+
+   dput(wh_lower_dentry);
+out:
+   return err;
+}
+
+/*
+ * Main (and complex) driver function for Unionfs's lookup
+ *
+ * Returns: NULL (ok), ERR_PTR if an error occurred, or a non-null non-error
+ * PTR if d_splice returned a different dentry.
+ *
+ * If lookupmode is INTERPOSE_PARTIAL/REVAL/REVAL_NEG, the passed dentry's
+ * inode info must be locked.  If lookupmode is INTERPOSE_LOOKUP (i.e., a
+ * newly looked-up dentry), then unionfs_lookup_backend will return a locked
+ * dentry's info, which the caller must unlock.
+ */
+struct dentry *unionfs_lookup_backend(struct dentry *dentry,
+ struct nameidata *nd, int lookupmode)
+{
+   int err = 0;
+   struct dentry *lower_dentry = NULL;
+   struct dentry *wh_lower_dentry = NULL;
+   struct dentry *lower_dir_dentry = NULL;
+   struct dentry *parent_dentry = NULL;
+   struct dentry *d_interposed = NULL;
+   int bindex, bstart = -1, bend, bopaque;
+   int dentry_count = 0;   /* Number of positive dentries. */
+   int first_dentry_offset = -1; /* -1 is uninitialized */
+   struct dentry *first_dentry = NULL;
+   struct dentry *first_lower_dentry = NULL;
+   struct vfsmount *first_lower_mnt = NULL;
+   int locked_parent = 0;
+   int opaque;
+   char *whname = NULL;
+   const char *name;
+   int namelen;
+
+   /*
+* We should already have a lock on this dentry in the case of a
+* partial lookup, or a revalidation. Otherwise it is returned from
+* new_dentry_private_data already locked.
+*/
+   if (lookupmode == INTERPOSE_PARTIAL || lookupmode == INTERPOSE_REVAL ||
+   lookupmode == INTERPOSE_REVAL_NEG)
+   verify_locked(dentry);
+   else/* this could only be INTERPOSE_LOOKUP */
+   BUG_ON(UNIONFS_D(dentry) != NULL);
+
+   switch (lookupmode) {
+   case INTERPOSE_PARTIAL:
+   break;
+   case INTERPOSE_LOOKUP:
+   err = new_dentry_private_data(dentry);
+   if (unlikely(err))
+   goto out;
+   break;
+   default:
+   /* default: can 

[PATCH 11/42] Unionfs: main header file

2007-12-09 Thread Erez Zadok
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/union.h |  591 
 1 files changed, 591 insertions(+), 0 deletions(-)
 create mode 100644 fs/unionfs/union.h

diff --git a/fs/unionfs/union.h b/fs/unionfs/union.h
new file mode 100644
index 000..20bff7b
--- /dev/null
+++ b/fs/unionfs/union.h
@@ -0,0 +1,591 @@
+/*
+ * Copyright (c) 2003-2007 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2007 Stony Brook University
+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _UNION_H_
+#define _UNION_H_
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include 
+
+/* the file system name */
+#define UNIONFS_NAME "unionfs"
+
+/* unionfs root inode number */
+#define UNIONFS_ROOT_INO 1
+
+/* number of times we try to get a unique temporary file name */
+#define GET_TMPNAM_MAX_RETRY   5
+
+/* maximum number of branches we support, to avoid memory blowup */
+#define UNIONFS_MAX_BRANCHES   128
+
+/* minimum time (seconds) required for time-based cache-coherency */
+#define UNIONFS_MIN_CC_TIME3
+
+/* Operations vectors defined in specific files. */
+extern struct file_operations unionfs_main_fops;
+extern struct file_operations unionfs_dir_fops;
+extern struct inode_operations unionfs_main_iops;
+extern struct inode_operations unionfs_dir_iops;
+extern struct inode_operations unionfs_symlink_iops;
+extern struct super_operations unionfs_sops;
+extern struct dentry_operations unionfs_dops;
+extern struct address_space_operations unionfs_aops;
+
+/* How long should an entry be allowed to persist */
+#define RDCACHE_JIFFIES(5*HZ)
+
+/* file private data. */
+struct unionfs_file_info {
+   int bstart;
+   int bend;
+   atomic_t generation;
+
+   struct unionfs_dir_state *rdstate;
+   struct file **lower_files;
+   int *saved_branch_ids; /* IDs of branches when file was opened */
+};
+
+/* unionfs inode data in memory */
+struct unionfs_inode_info {
+   int bstart;
+   int bend;
+   atomic_t generation;
+   int stale;
+   /* Stuff for readdir over NFS. */
+   spinlock_t rdlock;
+   struct list_head readdircache;
+   int rdcount;
+   int hashsize;
+   int cookie;
+
+   /* The lower inodes */
+   struct inode **lower_inodes;
+
+   struct inode vfs_inode;
+};
+
+/* unionfs dentry data in memory */
+struct unionfs_dentry_info {
+   /*
+* The semaphore is used to lock the dentry as soon as we get into a
+* unionfs function from the VFS.  Our lock ordering is that children
+* go before their parents.
+*/
+   struct mutex lock;
+   int bstart;
+   int bend;
+   int bopaque;
+   int bcount;
+   atomic_t generation;
+   struct path *lower_paths;
+};
+
+/* These are the pointers to our various objects. */
+struct unionfs_data {
+   struct super_block *sb;
+   atomic_t open_files;/* number of open files on branch */
+   int branchperms;
+   int branch_id;  /* unique branch ID at re/mount time */
+};
+
+/* unionfs super-block data in memory */
+struct unionfs_sb_info {
+   int bend;
+
+   atomic_t generation;
+
+   /*
+* This rwsem is used to make sure that a branch management
+* operation...
+*   1) will not begin before all currently in-flight operations
+*  complete.
+*   2) any new operations do not execute until the currently
+*  running branch management operation completes.
+*
+* The write_lock_owner records the PID of the task which grabbed
+* the rw_sem for writing.  If the same task also tries to grab the
+* read lock, we allow it.  This prevents a self-deadlock when
+* branch-management is used on a pivot_root'ed union, because we
+* have to ->lookup paths which belong to the same union.
+*/
+   struct rw_semaphore rwsem;
+   pid_t write_lock_owner; /* PID of rw_sem owner (write lock) */
+   int high_branch_id; /* last unique branch ID given */
+   struct unionfs_data *data;
+};
+
+/*
+ * structure for making the linked list of entries by readdir on 

[PATCH 35/42] Unionfs: common header file for user-land utilities and kernel

2007-12-09 Thread Erez Zadok
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 include/linux/union_fs.h |   24 
 1 files changed, 24 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/union_fs.h

diff --git a/include/linux/union_fs.h b/include/linux/union_fs.h
new file mode 100644
index 000..d29318f
--- /dev/null
+++ b/include/linux/union_fs.h
@@ -0,0 +1,24 @@
+/*
+ * Copyright (c) 2003-2007 Erez Zadok
+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
+ * Copyright (c) 2003-2007 Stony Brook University
+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _LINUX_UNION_FS_H
+#define _LINUX_UNION_FS_H
+
+#define UNIONFS_VERSION  "2.1-mm"
+
+/*
+ * DEFINITIONS FOR USER AND KERNEL CODE:
+ */
+# define UNIONFS_IOCTL_INCGEN  _IOR(0x15, 11, int)
+# define UNIONFS_IOCTL_QUERYFILE   _IOR(0x15, 15, int)
+
+#endif /* _LINUX_UNIONFS_H */
+
-- 
1.5.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/42] Unionfs: fanout header definitions

2007-12-09 Thread Erez Zadok
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/fanout.h |  355 +++
 1 files changed, 355 insertions(+), 0 deletions(-)
 create mode 100644 fs/unionfs/fanout.h

diff --git a/fs/unionfs/fanout.h b/fs/unionfs/fanout.h
new file mode 100644
index 000..864383e
--- /dev/null
+++ b/fs/unionfs/fanout.h
@@ -0,0 +1,355 @@
+/*
+ * Copyright (c) 2003-2007 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2007 Stony Brook University
+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _FANOUT_H_
+#define _FANOUT_H_
+
+/*
+ * Inode to private data
+ *
+ * Since we use containers and the struct inode is _inside_ the
+ * unionfs_inode_info structure, UNIONFS_I will always (given a non-NULL
+ * inode pointer), return a valid non-NULL pointer.
+ */
+static inline struct unionfs_inode_info *UNIONFS_I(const struct inode *inode)
+{
+   return container_of(inode, struct unionfs_inode_info, vfs_inode);
+}
+
+#define ibstart(ino) (UNIONFS_I(ino)->bstart)
+#define ibend(ino) (UNIONFS_I(ino)->bend)
+
+/* Superblock to private data */
+#define UNIONFS_SB(super) ((struct unionfs_sb_info *)(super)->s_fs_info)
+#define sbstart(sb) 0
+#define sbend(sb) (UNIONFS_SB(sb)->bend)
+#define sbmax(sb) (UNIONFS_SB(sb)->bend + 1)
+#define sbhbid(sb) (UNIONFS_SB(sb)->high_branch_id)
+
+/* File to private Data */
+#define UNIONFS_F(file) ((struct unionfs_file_info *)((file)->private_data))
+#define fbstart(file) (UNIONFS_F(file)->bstart)
+#define fbend(file) (UNIONFS_F(file)->bend)
+
+/* macros to manipulate branch IDs in stored in our superblock */
+static inline int branch_id(struct super_block *sb, int index)
+{
+   BUG_ON(!sb || index < 0);
+   return UNIONFS_SB(sb)->data[index].branch_id;
+}
+
+static inline void set_branch_id(struct super_block *sb, int index, int val)
+{
+   BUG_ON(!sb || index < 0);
+   UNIONFS_SB(sb)->data[index].branch_id = val;
+}
+
+static inline void new_branch_id(struct super_block *sb, int index)
+{
+   BUG_ON(!sb || index < 0);
+   set_branch_id(sb, index, ++UNIONFS_SB(sb)->high_branch_id);
+}
+
+/*
+ * Find new index of matching branch with an existing superblock of a known
+ * (possibly old) id.  This is needed because branches could have been
+ * added/deleted causing the branches of any open files to shift.
+ *
+ * @sb: the new superblock which may have new/different branch IDs
+ * @id: the old/existing id we're looking for
+ * Returns index of newly found branch (0 or greater), -1 otherwise.
+ */
+static inline int branch_id_to_idx(struct super_block *sb, int id)
+{
+   int i;
+   for (i = 0; i < sbmax(sb); i++) {
+   if (branch_id(sb, i) == id)
+   return i;
+   }
+   /* in the non-ODF code, this should really never happen */
+   printk(KERN_WARNING "unionfs: cannot find branch with id %d\n", id);
+   return -1;
+}
+
+/* File to lower file. */
+static inline struct file *unionfs_lower_file(const struct file *f)
+{
+   BUG_ON(!f);
+   return UNIONFS_F(f)->lower_files[fbstart(f)];
+}
+
+static inline struct file *unionfs_lower_file_idx(const struct file *f,
+ int index)
+{
+   BUG_ON(!f || index < 0);
+   return UNIONFS_F(f)->lower_files[index];
+}
+
+static inline void unionfs_set_lower_file_idx(struct file *f, int index,
+ struct file *val)
+{
+   BUG_ON(!f || index < 0);
+   UNIONFS_F(f)->lower_files[index] = val;
+   /* save branch ID (may be redundant?) */
+   UNIONFS_F(f)->saved_branch_ids[index] =
+   branch_id((f)->f_path.dentry->d_sb, index);
+}
+
+static inline void unionfs_set_lower_file(struct file *f, struct file *val)
+{
+   BUG_ON(!f);
+   unionfs_set_lower_file_idx((f), fbstart(f), (val));
+}
+
+/* Inode to lower inode. */
+static inline struct inode *unionfs_lower_inode(const struct inode *i)
+{
+   BUG_ON(!i);
+   return UNIONFS_I(i)->lower_inodes[ibstart(i)];
+}
+
+static inline struct inode *unionfs_lower_inode_idx(const struct inode *i,
+   int index)
+{
+   BUG_ON(!i || index < 0);
+   return UNIONFS_I(i)->lower_inodes[index];
+}
+
+static inline void unionfs_set_lower_inode_idx(struct inode *i, int index,
+  struct inode *val)
+{
+   BUG_ON(!i || index < 0);
+   

[PATCH 17/42] Unionfs: rename method and helpers

2007-12-09 Thread Erez Zadok
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/rename.c |  533 +++
 1 files changed, 533 insertions(+), 0 deletions(-)
 create mode 100644 fs/unionfs/rename.c

diff --git a/fs/unionfs/rename.c b/fs/unionfs/rename.c
new file mode 100644
index 000..452d1e7
--- /dev/null
+++ b/fs/unionfs/rename.c
@@ -0,0 +1,533 @@
+/*
+ * Copyright (c) 2003-2007 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2007 Stony Brook University
+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+static int __unionfs_rename(struct inode *old_dir, struct dentry *old_dentry,
+   struct inode *new_dir, struct dentry *new_dentry,
+   int bindex, struct dentry **wh_old)
+{
+   int err = 0;
+   struct dentry *lower_old_dentry;
+   struct dentry *lower_new_dentry;
+   struct dentry *lower_old_dir_dentry;
+   struct dentry *lower_new_dir_dentry;
+   struct dentry *lower_wh_dentry;
+   struct dentry *lower_wh_dir_dentry;
+   char *wh_name = NULL;
+
+   lower_new_dentry = unionfs_lower_dentry_idx(new_dentry, bindex);
+   lower_old_dentry = unionfs_lower_dentry_idx(old_dentry, bindex);
+
+   if (!lower_new_dentry) {
+   lower_new_dentry =
+   create_parents(new_dentry->d_parent->d_inode,
+  new_dentry, new_dentry->d_name.name,
+  bindex);
+   if (IS_ERR(lower_new_dentry)) {
+   err = PTR_ERR(lower_new_dentry);
+   if (IS_COPYUP_ERR(err))
+   goto out;
+   printk(KERN_ERR "unionfs: error creating directory "
+  "tree for rename, bindex=%d err=%d\n",
+  bindex, err);
+   goto out;
+   }
+   }
+
+   wh_name = alloc_whname(new_dentry->d_name.name,
+  new_dentry->d_name.len);
+   if (unlikely(IS_ERR(wh_name))) {
+   err = PTR_ERR(wh_name);
+   goto out;
+   }
+
+   lower_wh_dentry = lookup_one_len(wh_name, lower_new_dentry->d_parent,
+new_dentry->d_name.len +
+UNIONFS_WHLEN);
+   if (IS_ERR(lower_wh_dentry)) {
+   err = PTR_ERR(lower_wh_dentry);
+   goto out;
+   }
+
+   if (lower_wh_dentry->d_inode) {
+   /* get rid of the whiteout that is existing */
+   if (lower_new_dentry->d_inode) {
+   printk(KERN_ERR "unionfs: both a whiteout and a "
+  "dentry exist when doing a rename!\n");
+   err = -EIO;
+
+   dput(lower_wh_dentry);
+   goto out;
+   }
+
+   lower_wh_dir_dentry = lock_parent(lower_wh_dentry);
+   err = is_robranch_super(old_dentry->d_sb, bindex);
+   if (!err)
+   err = vfs_unlink(lower_wh_dir_dentry->d_inode,
+lower_wh_dentry);
+
+   dput(lower_wh_dentry);
+   unlock_dir(lower_wh_dir_dentry);
+   if (err)
+   goto out;
+   } else {
+   dput(lower_wh_dentry);
+   }
+
+   dget(lower_old_dentry);
+   lower_old_dir_dentry = dget_parent(lower_old_dentry);
+   lower_new_dir_dentry = dget_parent(lower_new_dentry);
+
+   lock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
+
+   err = is_robranch_super(old_dentry->d_sb, bindex);
+   if (err)
+   goto out_unlock;
+
+   /*
+* ready to whiteout for old_dentry. caller will create the actual
+* whiteout, and must dput(*wh_old)
+*/
+   if (wh_old) {
+   char *whname;
+   whname = alloc_whname(old_dentry->d_name.name,
+ old_dentry->d_name.len);
+   err = PTR_ERR(whname);
+   if (unlikely(IS_ERR(whname)))
+   goto out_unlock;
+   *wh_old = lookup_one_len(whname, lower_old_dir_dentry,
+old_dentry->d_name.len +
+UNIONFS_WHLEN);

[PATCH 19/42] Unionfs: readdir helper functions

2007-12-09 Thread Erez Zadok
Includes whiteout handling for directories.

Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/dirhelper.c |  272 
 1 files changed, 272 insertions(+), 0 deletions(-)
 create mode 100644 fs/unionfs/dirhelper.c

diff --git a/fs/unionfs/dirhelper.c b/fs/unionfs/dirhelper.c
new file mode 100644
index 000..2e52fc3
--- /dev/null
+++ b/fs/unionfs/dirhelper.c
@@ -0,0 +1,272 @@
+/*
+ * Copyright (c) 2003-2007 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2007 Stony Brook University
+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+/*
+ * Delete all of the whiteouts in a given directory for rmdir.
+ *
+ * lower directory inode should be locked
+ */
+int do_delete_whiteouts(struct dentry *dentry, int bindex,
+   struct unionfs_dir_state *namelist)
+{
+   int err = 0;
+   struct dentry *lower_dir_dentry = NULL;
+   struct dentry *lower_dentry;
+   char *name = NULL, *p;
+   struct inode *lower_dir;
+   int i;
+   struct list_head *pos;
+   struct filldir_node *cursor;
+
+   /* Find out lower parent dentry */
+   lower_dir_dentry = unionfs_lower_dentry_idx(dentry, bindex);
+   BUG_ON(!S_ISDIR(lower_dir_dentry->d_inode->i_mode));
+   lower_dir = lower_dir_dentry->d_inode;
+   BUG_ON(!S_ISDIR(lower_dir->i_mode));
+
+   err = -ENOMEM;
+   name = __getname();
+   if (unlikely(!name))
+   goto out;
+   strcpy(name, UNIONFS_WHPFX);
+   p = name + UNIONFS_WHLEN;
+
+   err = 0;
+   for (i = 0; !err && i < namelist->size; i++) {
+   list_for_each(pos, >list[i]) {
+   cursor =
+   list_entry(pos, struct filldir_node,
+  file_list);
+   /* Only operate on whiteouts in this branch. */
+   if (cursor->bindex != bindex)
+   continue;
+   if (!cursor->whiteout)
+   continue;
+
+   strcpy(p, cursor->name);
+   lower_dentry =
+   lookup_one_len(name, lower_dir_dentry,
+  cursor->namelen +
+  UNIONFS_WHLEN);
+   if (IS_ERR(lower_dentry)) {
+   err = PTR_ERR(lower_dentry);
+   break;
+   }
+   if (lower_dentry->d_inode)
+   err = vfs_unlink(lower_dir, lower_dentry);
+   dput(lower_dentry);
+   if (err)
+   break;
+   }
+   }
+
+   __putname(name);
+
+   /* After all of the removals, we should copy the attributes once. */
+   fsstack_copy_attr_times(dentry->d_inode, lower_dir_dentry->d_inode);
+
+out:
+   return err;
+}
+
+/* delete whiteouts in a dir (for rmdir operation) using sioq if necessary */
+int delete_whiteouts(struct dentry *dentry, int bindex,
+struct unionfs_dir_state *namelist)
+{
+   int err;
+   struct super_block *sb;
+   struct dentry *lower_dir_dentry;
+   struct inode *lower_dir;
+   struct sioq_args args;
+
+   sb = dentry->d_sb;
+
+   BUG_ON(!S_ISDIR(dentry->d_inode->i_mode));
+   BUG_ON(bindex < dbstart(dentry));
+   BUG_ON(bindex > dbend(dentry));
+   err = is_robranch_super(sb, bindex);
+   if (err)
+   goto out;
+
+   lower_dir_dentry = unionfs_lower_dentry_idx(dentry, bindex);
+   BUG_ON(!S_ISDIR(lower_dir_dentry->d_inode->i_mode));
+   lower_dir = lower_dir_dentry->d_inode;
+   BUG_ON(!S_ISDIR(lower_dir->i_mode));
+
+   mutex_lock(_dir->i_mutex);
+   if (!permission(lower_dir, MAY_WRITE | MAY_EXEC, NULL)) {
+   err = do_delete_whiteouts(dentry, bindex, namelist);
+   } else {
+   args.deletewh.namelist = namelist;
+   args.deletewh.dentry = dentry;
+   args.deletewh.bindex = bindex;
+   run_sioq(__delete_whiteouts, );
+   err = args.err;
+   }
+   mutex_unlock(_dir->i_mutex);
+
+out:
+   return err;
+}
+
+#define RD_NONE 0
+#define RD_CHECK_EMPTY 1
+/* The callback 

[PATCH 42/42] eCryptfs: use simplified fs_stack API for main operations

2007-12-09 Thread Erez Zadok
CC: Mike Halcrow <[EMAIL PROTECTED]>

Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/ecryptfs/main.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/ecryptfs/main.c b/fs/ecryptfs/main.c
index b83a512..cebe7dc 100644
--- a/fs/ecryptfs/main.c
+++ b/fs/ecryptfs/main.c
@@ -208,7 +208,7 @@ int ecryptfs_interpose(struct dentry *lower_dentry, struct 
dentry *dentry,
d_add(dentry, inode);
else
d_instantiate(dentry, inode);
-   fsstack_copy_attr_all(inode, lower_inode, NULL);
+   fsstack_copy_attr_all(inode, lower_inode);
/* This size will be overwritten for real files w/ headers and
 * other metadata */
fsstack_copy_inode_size(inode, lower_inode);
-- 
1.5.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 23/42] Unionfs: address-space operations

2007-12-09 Thread Erez Zadok
Includes writepage, writepages, readpage, prepare_write, and commit_write.

Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/mmap.c |  338 +
 1 files changed, 338 insertions(+), 0 deletions(-)
 create mode 100644 fs/unionfs/mmap.c

diff --git a/fs/unionfs/mmap.c b/fs/unionfs/mmap.c
new file mode 100644
index 000..4d05352
--- /dev/null
+++ b/fs/unionfs/mmap.c
@@ -0,0 +1,338 @@
+/*
+ * Copyright (c) 2003-2007 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2006  Shaya Potter
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2007 Stony Brook University
+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+static int unionfs_writepage(struct page *page, struct writeback_control *wbc)
+{
+   int err = -EIO;
+   struct inode *inode;
+   struct inode *lower_inode;
+   struct page *lower_page;
+   struct address_space *lower_mapping; /* lower inode mapping */
+   gfp_t mask;
+
+   BUG_ON(!PageUptodate(page));
+   inode = page->mapping->host;
+   lower_inode = unionfs_lower_inode(inode);
+   lower_mapping = lower_inode->i_mapping;
+
+   /*
+* find lower page (returns a locked page)
+*
+* We turn off __GFP_FS while we look for or create a new lower
+* page.  This prevents a recursion into the file system code, which
+* under memory pressure conditions could lead to a deadlock.  This
+* is similar to how the loop driver behaves (see loop_set_fd in
+* drivers/block/loop.c).  If we can't find the lower page, we
+* redirty our page and return "success" so that the VM will call us
+* again in the (hopefully near) future.
+*/
+   mask = mapping_gfp_mask(lower_mapping) & ~(__GFP_FS);
+   lower_page = find_or_create_page(lower_mapping, page->index, mask);
+   if (!lower_page) {
+   err = 0;
+   set_page_dirty(page);
+   goto out;
+   }
+
+   /* copy page data from our upper page to the lower page */
+   copy_highpage(lower_page, page);
+   flush_dcache_page(lower_page);
+   SetPageUptodate(lower_page);
+   set_page_dirty(lower_page);
+
+   /*
+* Call lower writepage (expects locked page).  However, if we are
+* called with wbc->for_reclaim, then the VFS/VM just wants to
+* reclaim our page.  Therefore, we don't need to call the lower
+* ->writepage: just copy our data to the lower page (already done
+* above), then mark the lower page dirty and unlock it, and return
+* success.
+*/
+   if (wbc->for_reclaim) {
+   unlock_page(lower_page);
+   goto out_release;
+   }
+
+   BUG_ON(!lower_mapping->a_ops->writepage);
+   wait_on_page_writeback(lower_page); /* prevent multiple writers */
+   clear_page_dirty_for_io(lower_page); /* emulate VFS behavior */
+   err = lower_mapping->a_ops->writepage(lower_page, wbc);
+   if (err < 0)
+   goto out_release;
+
+   /*
+* Lower file systems such as ramfs and tmpfs, may return
+* AOP_WRITEPAGE_ACTIVATE so that the VM won't try to (pointlessly)
+* write the page again for a while.  But those lower file systems
+* also set the page dirty bit back again.  Since we successfully
+* copied our page data to the lower page, then the VM will come
+* back to the lower page (directly) and try to flush it.  So we can
+* save the VM the hassle of coming back to our page and trying to
+* flush too.  Therefore, we don't re-dirty our own page, and we
+* never return AOP_WRITEPAGE_ACTIVATE back to the VM (we consider
+* this a success).
+*
+* We also unlock the lower page if the lower ->writepage returned
+* AOP_WRITEPAGE_ACTIVATE.  (This "anomalous" behaviour may be
+* addressed in future shmem/VM code.)
+*/
+   if (err == AOP_WRITEPAGE_ACTIVATE) {
+   err = 0;
+   unlock_page(lower_page);
+   }
+
+   /* all is well */
+
+   /* lower mtimes have changed: update ours */
+   unionfs_copy_attr_times(inode);
+
+out_release:
+   /* b/c find_or_create_page increased refcnt */
+   page_cache_release(lower_page);
+out:
+   /*
+* We unlock our page unconditionally, because we never return
+* 

[PATCH 25/42] Unionfs: super_block operations

2007-12-09 Thread Erez Zadok
Includes read_inode, delete_inode, put_super, statfs, remount_fs (which
supports branch-management ops), clear_inode, alloc_inode, destroy_inode,
write_inode, umount_begin, and show_options.

Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/super.c | 1020 
 1 files changed, 1020 insertions(+), 0 deletions(-)
 create mode 100644 fs/unionfs/super.c

diff --git a/fs/unionfs/super.c b/fs/unionfs/super.c
new file mode 100644
index 000..d9cf2a7
--- /dev/null
+++ b/fs/unionfs/super.c
@@ -0,0 +1,1020 @@
+/*
+ * Copyright (c) 2003-2007 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2007 Stony Brook University
+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+/*
+ * The inode cache is used with alloc_inode for both our inode info and the
+ * vfs inode.
+ */
+static struct kmem_cache *unionfs_inode_cachep;
+
+static void unionfs_read_inode(struct inode *inode)
+{
+   int size;
+   struct unionfs_inode_info *info = UNIONFS_I(inode);
+
+   unionfs_read_lock(inode->i_sb);
+
+   memset(info, 0, offsetof(struct unionfs_inode_info, vfs_inode));
+   info->bstart = -1;
+   info->bend = -1;
+   atomic_set(>generation,
+  atomic_read(_SB(inode->i_sb)->generation));
+   spin_lock_init(>rdlock);
+   info->rdcount = 1;
+   info->hashsize = -1;
+   INIT_LIST_HEAD(>readdircache);
+
+   size = sbmax(inode->i_sb) * sizeof(struct inode *);
+   info->lower_inodes = kzalloc(size, GFP_KERNEL);
+   if (unlikely(!info->lower_inodes)) {
+   printk(KERN_CRIT "unionfs: no kernel memory when allocating "
+  "lower-pointer array!\n");
+   BUG();
+   }
+
+   inode->i_version++;
+   inode->i_op = _main_iops;
+   inode->i_fop = _main_fops;
+
+   inode->i_mapping->a_ops = _aops;
+
+   unionfs_read_unlock(inode->i_sb);
+}
+
+/*
+ * we now define delete_inode, because there are two VFS paths that may
+ * destroy an inode: one of them calls clear inode before doing everything
+ * else that's needed, and the other is fine.  This way we truncate the inode
+ * size (and its pages) and then clear our own inode, which will do an iput
+ * on our and the lower inode.
+ *
+ * No need to lock sb info's rwsem.
+ */
+static void unionfs_delete_inode(struct inode *inode)
+{
+   i_size_write(inode, 0); /* every f/s seems to do that */
+
+   if (inode->i_data.nrpages)
+   truncate_inode_pages(>i_data, 0);
+
+   clear_inode(inode);
+}
+
+/*
+ * final actions when unmounting a file system
+ *
+ * No need to lock rwsem.
+ */
+static void unionfs_put_super(struct super_block *sb)
+{
+   int bindex, bstart, bend;
+   struct unionfs_sb_info *spd;
+   int leaks = 0;
+
+   spd = UNIONFS_SB(sb);
+   if (!spd)
+   return;
+
+   bstart = sbstart(sb);
+   bend = sbend(sb);
+
+   /* Make sure we have no leaks of branchget/branchput. */
+   for (bindex = bstart; bindex <= bend; bindex++)
+   if (unlikely(branch_count(sb, bindex) != 0)) {
+   printk(KERN_CRIT
+  "unionfs: branch %d has %d references left!\n",
+  bindex, branch_count(sb, bindex));
+   leaks = 1;
+   }
+   BUG_ON(leaks != 0);
+
+   kfree(spd->data);
+   kfree(spd);
+   sb->s_fs_info = NULL;
+}
+
+/*
+ * Since people use this to answer the "How big of a file can I write?"
+ * question, we report the size of the highest priority branch as the size of
+ * the union.
+ */
+static int unionfs_statfs(struct dentry *dentry, struct kstatfs *buf)
+{
+   int err = 0;
+   struct super_block *sb;
+   struct dentry *lower_dentry;
+
+   sb = dentry->d_sb;
+
+   unionfs_read_lock(sb);
+   unionfs_lock_dentry(dentry);
+
+   if (unlikely(!__unionfs_d_revalidate_chain(dentry, NULL, false))) {
+   err = -ESTALE;
+   goto out;
+   }
+   unionfs_check_dentry(dentry);
+
+   lower_dentry = unionfs_lower_dentry(sb->s_root);
+   err = vfs_statfs(lower_dentry, buf);
+
+   /* set return buf to our f/s to avoid confusing user-level utils */
+   buf->f_type = UNIONFS_SUPER_MAGIC;
+   /*
+* Our maximum file name can is shorter by a few bytes because every
+* 

[PATCH 37/42] VFS: export release_open_intent symbol

2007-12-09 Thread Erez Zadok
Needed to release the resources of the lower nameidata structures that we
create and pass to lower file systems (e.g., when calling vfs_create).

Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/namei.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 3b993db..14f9861 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -389,6 +389,7 @@ void release_open_intent(struct nameidata *nd)
else
fput(nd->intent.open.file);
 }
+EXPORT_SYMBOL(release_open_intent);
 
 static inline struct dentry *
 do_revalidate(struct dentry *dentry, struct nameidata *nd)
-- 
1.5.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 32/42] Unionfs file system magic number

2007-12-09 Thread Erez Zadok
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 include/linux/magic.h |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/include/linux/magic.h b/include/linux/magic.h
index 1fa0c2c..67043ed 100644
--- a/include/linux/magic.h
+++ b/include/linux/magic.h
@@ -35,6 +35,8 @@
 #define REISER2FS_SUPER_MAGIC_STRING   "ReIsEr2Fs"
 #define REISER2FS_JR_SUPER_MAGIC_STRING"ReIsEr3Fs"
 
+#define UNIONFS_SUPER_MAGIC 0xf15f083d
+
 #define SMB_SUPER_MAGIC0x517B
 #define USBDEVICE_SUPER_MAGIC  0x9fa2
 #define CGROUP_SUPER_MAGIC 0x27e0eb
-- 
1.5.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 33/42] MM: extern for drop_pagecache_sb

2007-12-09 Thread Erez Zadok
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 include/linux/mm.h |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1b7b95c..fc61bd3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -19,6 +19,7 @@ struct anon_vma;
 struct file_ra_state;
 struct user_struct;
 struct writeback_control;
+struct super_block;
 
 #ifndef CONFIG_DISCONTIGMEM  /* Don't use mapnrs, do it properly */
 extern unsigned long max_mapnr;
@@ -1135,6 +1136,7 @@ int drop_caches_sysctl_handler(struct ctl_table *, int, 
struct file *,
void __user *, size_t *, loff_t *);
 unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
unsigned long lru_pages);
+extern void drop_pagecache_sb(struct super_block *);
 void drop_pagecache(void);
 void drop_slab(void);
 
-- 
1.5.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 14/42] Unionfs: lower-level copyup routines

2007-12-09 Thread Erez Zadok
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/copyup.c |  897 +++
 1 files changed, 897 insertions(+), 0 deletions(-)
 create mode 100644 fs/unionfs/copyup.c

diff --git a/fs/unionfs/copyup.c b/fs/unionfs/copyup.c
new file mode 100644
index 000..3fe4865
--- /dev/null
+++ b/fs/unionfs/copyup.c
@@ -0,0 +1,897 @@
+/*
+ * Copyright (c) 2003-2007 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2007 Stony Brook University
+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+/*
+ * For detailed explanation of copyup see:
+ * Documentation/filesystems/unionfs/concepts.txt
+ */
+
+#ifdef CONFIG_UNION_FS_XATTR
+/* copyup all extended attrs for a given dentry */
+static int copyup_xattrs(struct dentry *old_lower_dentry,
+struct dentry *new_lower_dentry)
+{
+   int err = 0;
+   ssize_t list_size = -1;
+   char *name_list = NULL;
+   char *attr_value = NULL;
+   char *name_list_buf = NULL;
+
+   /* query the actual size of the xattr list */
+   list_size = vfs_listxattr(old_lower_dentry, NULL, 0);
+   if (list_size <= 0) {
+   err = list_size;
+   goto out;
+   }
+
+   /* allocate space for the actual list */
+   name_list = unionfs_xattr_alloc(list_size + 1, XATTR_LIST_MAX);
+   if (unlikely(!name_list || IS_ERR(name_list))) {
+   err = PTR_ERR(name_list);
+   goto out;
+   }
+
+   name_list_buf = name_list; /* save for kfree at end */
+
+   /* now get the actual xattr list of the source file */
+   list_size = vfs_listxattr(old_lower_dentry, name_list, list_size);
+   if (list_size <= 0) {
+   err = list_size;
+   goto out;
+   }
+
+   /* allocate space to hold each xattr's value */
+   attr_value = unionfs_xattr_alloc(XATTR_SIZE_MAX, XATTR_SIZE_MAX);
+   if (unlikely(!attr_value || IS_ERR(attr_value))) {
+   err = PTR_ERR(name_list);
+   goto out;
+   }
+
+   /* in a loop, get and set each xattr from src to dst file */
+   while (*name_list) {
+   ssize_t size;
+
+   /* Lock here since vfs_getxattr doesn't lock for us */
+   mutex_lock(_lower_dentry->d_inode->i_mutex);
+   size = vfs_getxattr(old_lower_dentry, name_list,
+   attr_value, XATTR_SIZE_MAX);
+   mutex_unlock(_lower_dentry->d_inode->i_mutex);
+   if (size < 0) {
+   err = size;
+   goto out;
+   }
+   if (size > XATTR_SIZE_MAX) {
+   err = -E2BIG;
+   goto out;
+   }
+   /* Don't lock here since vfs_setxattr does it for us. */
+   err = vfs_setxattr(new_lower_dentry, name_list, attr_value,
+  size, 0);
+   /*
+* Selinux depends on "security.*" xattrs, so to maintain
+* the security of copied-up files, if Selinux is active,
+* then we must copy these xattrs as well.  So we need to
+* temporarily get FOWNER privileges.
+* XXX: move entire copyup code to SIOQ.
+*/
+   if (err == -EPERM && !capable(CAP_FOWNER)) {
+   cap_raise(current->cap_effective, CAP_FOWNER);
+   err = vfs_setxattr(new_lower_dentry, name_list,
+  attr_value, size, 0);
+   cap_lower(current->cap_effective, CAP_FOWNER);
+   }
+   if (err < 0)
+   goto out;
+   name_list += strlen(name_list) + 1;
+   }
+out:
+   unionfs_xattr_kfree(name_list_buf);
+   unionfs_xattr_kfree(attr_value);
+   /* Ignore if xattr isn't supported */
+   if (err == -ENOTSUPP || err == -EOPNOTSUPP)
+   err = 0;
+   return err;
+}
+#endif /* CONFIG_UNION_FS_XATTR */
+
+/*
+ * Determine the mode based on the copyup flags, and the existing dentry.
+ *
+ * Handle file systems which may not support certain options.  For example
+ * jffs2 doesn't allow one to chmod a symlink.  So we ignore such harmless
+ * errors, rather than propagating them up, which results in copyup errors
+ * and errors 

[PATCH 24/42] Unionfs: mount-time and stacking-interposition functions

2007-12-09 Thread Erez Zadok
Includes read_super and module-linkage routines.

Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/main.c |  783 +
 1 files changed, 783 insertions(+), 0 deletions(-)
 create mode 100644 fs/unionfs/main.c

diff --git a/fs/unionfs/main.c b/fs/unionfs/main.c
new file mode 100644
index 000..22aa6e6
--- /dev/null
+++ b/fs/unionfs/main.c
@@ -0,0 +1,783 @@
+/*
+ * Copyright (c) 2003-2007 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2007 Stony Brook University
+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+#include 
+#include 
+
+static void unionfs_fill_inode(struct dentry *dentry,
+  struct inode *inode)
+{
+   struct inode *lower_inode;
+   struct dentry *lower_dentry;
+   int bindex, bstart, bend;
+
+   bstart = dbstart(dentry);
+   bend = dbend(dentry);
+
+   for (bindex = bstart; bindex <= bend; bindex++) {
+   lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
+   if (!lower_dentry) {
+   unionfs_set_lower_inode_idx(inode, bindex, NULL);
+   continue;
+   }
+
+   /* Initialize the lower inode to the new lower inode. */
+   if (!lower_dentry->d_inode)
+   continue;
+
+   unionfs_set_lower_inode_idx(inode, bindex,
+   igrab(lower_dentry->d_inode));
+   }
+
+   ibstart(inode) = dbstart(dentry);
+   ibend(inode) = dbend(dentry);
+
+   /* Use attributes from the first branch. */
+   lower_inode = unionfs_lower_inode(inode);
+
+   /* Use different set of inode ops for symlinks & directories */
+   if (S_ISLNK(lower_inode->i_mode))
+   inode->i_op = _symlink_iops;
+   else if (S_ISDIR(lower_inode->i_mode))
+   inode->i_op = _dir_iops;
+
+   /* Use different set of file ops for directories */
+   if (S_ISDIR(lower_inode->i_mode))
+   inode->i_fop = _dir_fops;
+
+   /* properly initialize special inodes */
+   if (S_ISBLK(lower_inode->i_mode) || S_ISCHR(lower_inode->i_mode) ||
+   S_ISFIFO(lower_inode->i_mode) || S_ISSOCK(lower_inode->i_mode))
+   init_special_inode(inode, lower_inode->i_mode,
+  lower_inode->i_rdev);
+
+   /* all well, copy inode attributes */
+   unionfs_copy_attr_all(inode, lower_inode);
+   fsstack_copy_inode_size(inode, lower_inode);
+}
+
+/*
+ * Connect a unionfs inode dentry/inode with several lower ones.  This is
+ * the classic stackable file system "vnode interposition" action.
+ *
+ * @sb: unionfs's super_block
+ */
+struct dentry *unionfs_interpose(struct dentry *dentry, struct super_block *sb,
+int flag)
+{
+   int err = 0;
+   struct inode *inode;
+   int is_negative_dentry = 1;
+   int bindex, bstart, bend;
+   int need_fill_inode = 1;
+   struct dentry *spliced = NULL;
+
+   verify_locked(dentry);
+
+   bstart = dbstart(dentry);
+   bend = dbend(dentry);
+
+   /* Make sure that we didn't get a negative dentry. */
+   for (bindex = bstart; bindex <= bend; bindex++) {
+   if (unionfs_lower_dentry_idx(dentry, bindex) &&
+   unionfs_lower_dentry_idx(dentry, bindex)->d_inode) {
+   is_negative_dentry = 0;
+   break;
+   }
+   }
+   BUG_ON(is_negative_dentry);
+
+   /*
+* We allocate our new inode below, by calling iget.
+* iget will call our read_inode which will initialize some
+* of the new inode's fields
+*/
+
+   /*
+* On revalidate we've already got our own inode and just need
+* to fix it up.
+*/
+   if (flag == INTERPOSE_REVAL) {
+   inode = dentry->d_inode;
+   UNIONFS_I(inode)->bstart = -1;
+   UNIONFS_I(inode)->bend = -1;
+   atomic_set(_I(inode)->generation,
+  atomic_read(_SB(sb)->generation));
+
+   UNIONFS_I(inode)->lower_inodes =
+   kcalloc(sbmax(sb), sizeof(struct inode *), GFP_KERNEL);
+   if (unlikely(!UNIONFS_I(inode)->lower_inodes)) {
+   err = -ENOMEM;
+   goto 

[PATCH 41/42] eCryptfs: use simplified fs_stack API for inode operations

2007-12-09 Thread Erez Zadok
CC: Mike Halcrow <[EMAIL PROTECTED]>

Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/ecryptfs/inode.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 0b1ab01..a846557 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -588,9 +588,9 @@ ecryptfs_rename(struct inode *old_dir, struct dentry 
*old_dentry,
lower_new_dir_dentry->d_inode, lower_new_dentry);
if (rc)
goto out_lock;
-   fsstack_copy_attr_all(new_dir, lower_new_dir_dentry->d_inode, NULL);
+   fsstack_copy_attr_all(new_dir, lower_new_dir_dentry->d_inode);
if (new_dir != old_dir)
-   fsstack_copy_attr_all(old_dir, lower_old_dir_dentry->d_inode, 
NULL);
+   fsstack_copy_attr_all(old_dir, lower_old_dir_dentry->d_inode);
 out_lock:
unlock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
dput(lower_new_dentry->d_parent);
@@ -924,7 +924,7 @@ static int ecryptfs_setattr(struct dentry *dentry, struct 
iattr *ia)
 
rc = notify_change(lower_dentry, ia);
 out:
-   fsstack_copy_attr_all(inode, lower_inode, NULL);
+   fsstack_copy_attr_all(inode, lower_inode);
return rc;
 }
 
-- 
1.5.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 38/42] VFS: simplified fsstack_copy_attr_all

2007-12-09 Thread Erez Zadok
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/stack.c |   30 +-
 1 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/fs/stack.c b/fs/stack.c
index 67716f6..a548aac 100644
--- a/fs/stack.c
+++ b/fs/stack.c
@@ -1,8 +1,20 @@
+/*
+ * Copyright (c) 2006-2007 Erez Zadok
+ * Copyright (c) 2006-2007 Josef 'Jeff' Sipek
+ * Copyright (c) 2006-2007 Stony Brook University
+ * Copyright (c) 2006-2007 The Research Foundation of SUNY
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
 #include 
 #include 
 #include 
 
-/* does _NOT_ require i_mutex to be held.
+/*
+ * does _NOT_ require i_mutex to be held.
  *
  * This function cannot be inlined since i_size_{read,write} is rather
  * heavy-weight on 32-bit systems
@@ -14,11 +26,11 @@ void fsstack_copy_inode_size(struct inode *dst, const 
struct inode *src)
 }
 EXPORT_SYMBOL_GPL(fsstack_copy_inode_size);
 
-/* copy all attributes; get_nlinks is optional way to override the i_nlink
+/*
+ * copy all attributes; get_nlinks is optional way to override the i_nlink
  * copying
  */
-void fsstack_copy_attr_all(struct inode *dest, const struct inode *src,
-   int (*get_nlinks)(struct inode *))
+void fsstack_copy_attr_all(struct inode *dest, const struct inode *src)
 {
dest->i_mode = src->i_mode;
dest->i_uid = src->i_uid;
@@ -29,14 +41,6 @@ void fsstack_copy_attr_all(struct inode *dest, const struct 
inode *src,
dest->i_ctime = src->i_ctime;
dest->i_blkbits = src->i_blkbits;
dest->i_flags = src->i_flags;
-
-   /*
-* Update the nlinks AFTER updating the above fields, because the
-* get_links callback may depend on them.
-*/
-   if (!get_nlinks)
-   dest->i_nlink = src->i_nlink;
-   else
-   dest->i_nlink = (*get_nlinks)(dest);
+   dest->i_nlink = src->i_nlink;
 }
 EXPORT_SYMBOL_GPL(fsstack_copy_attr_all);
-- 
1.5.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 15/42] Unionfs: dentry revalidation

2007-12-09 Thread Erez Zadok
Includes d_release methods and cache-coherency support for dentries.

Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/dentry.c |  498 +++
 1 files changed, 498 insertions(+), 0 deletions(-)
 create mode 100644 fs/unionfs/dentry.c

diff --git a/fs/unionfs/dentry.c b/fs/unionfs/dentry.c
new file mode 100644
index 000..7d27987
--- /dev/null
+++ b/fs/unionfs/dentry.c
@@ -0,0 +1,498 @@
+/*
+ * Copyright (c) 2003-2007 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2007 Stony Brook University
+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+/*
+ * Revalidate a single dentry.
+ * Assume that dentry's info node is locked.
+ * Assume that parent(s) are all valid already, but
+ * the child may not yet be valid.
+ * Returns true if valid, false otherwise.
+ */
+static bool __unionfs_d_revalidate_one(struct dentry *dentry,
+  struct nameidata *nd)
+{
+   bool valid = true;  /* default is valid */
+   struct dentry *lower_dentry;
+   int bindex, bstart, bend;
+   int sbgen, dgen;
+   int positive = 0;
+   int locked = 0;
+   int interpose_flag;
+   struct nameidata lowernd; /* TODO: be gentler to the stack */
+
+   if (nd)
+   memcpy(, nd, sizeof(struct nameidata));
+   else
+   memset(, 0, sizeof(struct nameidata));
+
+   verify_locked(dentry);
+
+   /* if the dentry is unhashed, do NOT revalidate */
+   if (d_deleted(dentry))
+   goto out;
+
+   BUG_ON(dbstart(dentry) == -1);
+   if (dentry->d_inode)
+   positive = 1;
+   dgen = atomic_read(_D(dentry)->generation);
+   sbgen = atomic_read(_SB(dentry->d_sb)->generation);
+   /*
+* If we are working on an unconnected dentry, then there is no
+* revalidation to be done, because this file does not exist within
+* the namespace, and Unionfs operates on the namespace, not data.
+*/
+   if (unlikely(sbgen != dgen)) {
+   struct dentry *result;
+   int pdgen;
+
+   /* The root entry should always be valid */
+   BUG_ON(IS_ROOT(dentry));
+
+   /* We can't work correctly if our parent isn't valid. */
+   pdgen = atomic_read(_D(dentry->d_parent)->generation);
+   BUG_ON(pdgen != sbgen); /* should never happen here */
+
+   /* Free the pointers for our inodes and this dentry. */
+   bstart = dbstart(dentry);
+   bend = dbend(dentry);
+   if (bstart >= 0) {
+   struct dentry *lower_dentry;
+   for (bindex = bstart; bindex <= bend; bindex++) {
+   lower_dentry =
+   unionfs_lower_dentry_idx(dentry,
+bindex);
+   dput(lower_dentry);
+   }
+   }
+   set_dbstart(dentry, -1);
+   set_dbend(dentry, -1);
+
+   interpose_flag = INTERPOSE_REVAL_NEG;
+   if (positive) {
+   interpose_flag = INTERPOSE_REVAL;
+   /*
+* During BRM, the VFS could already hold a lock on
+* a file being read, so don't lock it again
+* (deadlock), but if you lock it in this function,
+* then release it here too.
+*/
+   if (!mutex_is_locked(>d_inode->i_mutex)) {
+   mutex_lock(>d_inode->i_mutex);
+   locked = 1;
+   }
+
+   bstart = ibstart(dentry->d_inode);
+   bend = ibend(dentry->d_inode);
+   if (bstart >= 0) {
+   struct inode *lower_inode;
+   for (bindex = bstart; bindex <= bend;
+bindex++) {
+   lower_inode =
+   unionfs_lower_inode_idx(
+   dentry->d_inode,
+   bindex);
+

[PATCH 29/42] Unionfs: miscellaneous helper routines

2007-12-09 Thread Erez Zadok
Mostly related to whiteouts.

Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/subr.c |  242 +
 1 files changed, 242 insertions(+), 0 deletions(-)
 create mode 100644 fs/unionfs/subr.c

diff --git a/fs/unionfs/subr.c b/fs/unionfs/subr.c
new file mode 100644
index 000..1a26c57
--- /dev/null
+++ b/fs/unionfs/subr.c
@@ -0,0 +1,242 @@
+/*
+ * Copyright (c) 2003-2007 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2007 Stony Brook University
+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+/*
+ * Pass an unionfs dentry and an index.  It will try to create a whiteout
+ * for the filename in dentry, and will try in branch 'index'.  On error,
+ * it will proceed to a branch to the left.
+ */
+int create_whiteout(struct dentry *dentry, int start)
+{
+   int bstart, bend, bindex;
+   struct dentry *lower_dir_dentry;
+   struct dentry *lower_dentry;
+   struct dentry *lower_wh_dentry;
+   struct nameidata nd;
+   char *name = NULL;
+   int err = -EINVAL;
+
+   verify_locked(dentry);
+
+   bstart = dbstart(dentry);
+   bend = dbend(dentry);
+
+   /* create dentry's whiteout equivalent */
+   name = alloc_whname(dentry->d_name.name, dentry->d_name.len);
+   if (unlikely(IS_ERR(name))) {
+   err = PTR_ERR(name);
+   goto out;
+   }
+
+   for (bindex = start; bindex >= 0; bindex--) {
+   lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
+
+   if (!lower_dentry) {
+   /*
+* if lower dentry is not present, create the
+* entire lower dentry directory structure and go
+* ahead.  Since we want to just create whiteout, we
+* only want the parent dentry, and hence get rid of
+* this dentry.
+*/
+   lower_dentry = create_parents(dentry->d_inode,
+ dentry,
+ dentry->d_name.name,
+ bindex);
+   if (!lower_dentry || IS_ERR(lower_dentry)) {
+   int ret = PTR_ERR(lower_dentry);
+   if (!IS_COPYUP_ERR(ret))
+   printk(KERN_ERR
+  "unionfs: create_parents for "
+  "whiteout failed: bindex=%d "
+  "err=%d\n", bindex, ret);
+   continue;
+   }
+   }
+
+   lower_wh_dentry =
+   lookup_one_len(name, lower_dentry->d_parent,
+  dentry->d_name.len + UNIONFS_WHLEN);
+   if (IS_ERR(lower_wh_dentry))
+   continue;
+
+   /*
+* The whiteout already exists. This used to be impossible,
+* but now is possible because of opaqueness.
+*/
+   if (lower_wh_dentry->d_inode) {
+   dput(lower_wh_dentry);
+   err = 0;
+   goto out;
+   }
+
+   err = init_lower_nd(, LOOKUP_CREATE);
+   if (unlikely(err < 0))
+   goto out;
+   lower_dir_dentry = lock_parent(lower_wh_dentry);
+   err = is_robranch_super(dentry->d_sb, bindex);
+   if (!err)
+   err = vfs_create(lower_dir_dentry->d_inode,
+lower_wh_dentry,
+~current->fs->umask & S_IRWXUGO,
+);
+   unlock_dir(lower_dir_dentry);
+   dput(lower_wh_dentry);
+   release_lower_nd(, err);
+
+   if (!err || !IS_COPYUP_ERR(err))
+   break;
+   }
+
+   /* set dbopaque so that lookup will not proceed after this branch */
+   if (!err)
+   set_dbopaque(dentry, bindex);
+
+out:
+   kfree(name);
+   return err;
+}
+
+/*
+ * This is a helper function for rename, which ends up with 

[PATCH 31/42] VFS: fs_stack header cleanups

2007-12-09 Thread Erez Zadok
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 include/linux/fs_stack.h |   21 -
 1 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/include/linux/fs_stack.h b/include/linux/fs_stack.h
index bb516ce..6b52faf 100644
--- a/include/linux/fs_stack.h
+++ b/include/linux/fs_stack.h
@@ -1,17 +1,28 @@
+/*
+ * Copyright (c) 2006-2007 Erez Zadok
+ * Copyright (c) 2006-2007 Josef 'Jeff' Sipek
+ * Copyright (c) 2006-2007 Stony Brook University
+ * Copyright (c) 2006-2007 The Research Foundation of SUNY
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
 #ifndef _LINUX_FS_STACK_H
 #define _LINUX_FS_STACK_H
 
-/* This file defines generic functions used primarily by stackable
+/*
+ * This file defines generic functions used primarily by stackable
  * filesystems; none of these functions require i_mutex to be held.
  */
 
 #include 
 
 /* externs for fs/stack.c */
-extern void fsstack_copy_attr_all(struct inode *dest, const struct inode *src,
-   int (*get_nlinks)(struct inode *));
-
-extern void fsstack_copy_inode_size(struct inode *dst, const struct inode 
*src);
+extern void fsstack_copy_attr_all(struct inode *dest, const struct inode *src);
+extern void fsstack_copy_inode_size(struct inode *dst,
+   const struct inode *src);
 
 /* inlines */
 static inline void fsstack_copy_attr_atime(struct inode *dest,
-- 
1.5.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 26/42] Unionfs: extended attributes operations

2007-12-09 Thread Erez Zadok
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/xattr.c |  153 
 1 files changed, 153 insertions(+), 0 deletions(-)
 create mode 100644 fs/unionfs/xattr.c

diff --git a/fs/unionfs/xattr.c b/fs/unionfs/xattr.c
new file mode 100644
index 000..00c6d0d
--- /dev/null
+++ b/fs/unionfs/xattr.c
@@ -0,0 +1,153 @@
+/*
+ * Copyright (c) 2003-2007 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2007 Stony Brook University
+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+/* This is lifted from fs/xattr.c */
+void *unionfs_xattr_alloc(size_t size, size_t limit)
+{
+   void *ptr;
+
+   if (size > limit)
+   return ERR_PTR(-E2BIG);
+
+   if (!size)  /* size request, no buffer is needed */
+   return NULL;
+
+   ptr = kmalloc(size, GFP_KERNEL);
+   if (unlikely(!ptr))
+   return ERR_PTR(-ENOMEM);
+   return ptr;
+}
+
+/*
+ * BKL held by caller.
+ * dentry->d_inode->i_mutex locked
+ */
+ssize_t unionfs_getxattr(struct dentry *dentry, const char *name, void *value,
+size_t size)
+{
+   struct dentry *lower_dentry = NULL;
+   int err = -EOPNOTSUPP;
+
+   unionfs_read_lock(dentry->d_sb);
+   unionfs_lock_dentry(dentry);
+
+   if (unlikely(!__unionfs_d_revalidate_chain(dentry, NULL, false))) {
+   err = -ESTALE;
+   goto out;
+   }
+
+   lower_dentry = unionfs_lower_dentry(dentry);
+
+   err = vfs_getxattr(lower_dentry, (char *) name, value, size);
+
+out:
+   unionfs_check_dentry(dentry);
+   unionfs_unlock_dentry(dentry);
+   unionfs_read_unlock(dentry->d_sb);
+   return err;
+}
+
+/*
+ * BKL held by caller.
+ * dentry->d_inode->i_mutex locked
+ */
+int unionfs_setxattr(struct dentry *dentry, const char *name,
+const void *value, size_t size, int flags)
+{
+   struct dentry *lower_dentry = NULL;
+   int err = -EOPNOTSUPP;
+
+   unionfs_read_lock(dentry->d_sb);
+   unionfs_lock_dentry(dentry);
+
+   if (unlikely(!__unionfs_d_revalidate_chain(dentry, NULL, false))) {
+   err = -ESTALE;
+   goto out;
+   }
+
+   lower_dentry = unionfs_lower_dentry(dentry);
+
+   err = vfs_setxattr(lower_dentry, (char *) name, (void *) value,
+  size, flags);
+
+out:
+   unionfs_check_dentry(dentry);
+   unionfs_unlock_dentry(dentry);
+   unionfs_read_unlock(dentry->d_sb);
+   return err;
+}
+
+/*
+ * BKL held by caller.
+ * dentry->d_inode->i_mutex locked
+ */
+int unionfs_removexattr(struct dentry *dentry, const char *name)
+{
+   struct dentry *lower_dentry = NULL;
+   int err = -EOPNOTSUPP;
+
+   unionfs_read_lock(dentry->d_sb);
+   unionfs_lock_dentry(dentry);
+
+   if (unlikely(!__unionfs_d_revalidate_chain(dentry, NULL, false))) {
+   err = -ESTALE;
+   goto out;
+   }
+
+   lower_dentry = unionfs_lower_dentry(dentry);
+
+   err = vfs_removexattr(lower_dentry, (char *) name);
+
+out:
+   unionfs_check_dentry(dentry);
+   unionfs_unlock_dentry(dentry);
+   unionfs_read_unlock(dentry->d_sb);
+   return err;
+}
+
+/*
+ * BKL held by caller.
+ * dentry->d_inode->i_mutex locked
+ */
+ssize_t unionfs_listxattr(struct dentry *dentry, char *list, size_t size)
+{
+   struct dentry *lower_dentry = NULL;
+   int err = -EOPNOTSUPP;
+   char *encoded_list = NULL;
+
+   unionfs_read_lock(dentry->d_sb);
+   unionfs_lock_dentry(dentry);
+
+   if (unlikely(!__unionfs_d_revalidate_chain(dentry, NULL, false))) {
+   err = -ESTALE;
+   goto out;
+   }
+
+   lower_dentry = unionfs_lower_dentry(dentry);
+
+   encoded_list = list;
+   err = vfs_listxattr(lower_dentry, encoded_list, size);
+
+out:
+   unionfs_check_dentry(dentry);
+   unionfs_unlock_dentry(dentry);
+   unionfs_read_unlock(dentry->d_sb);
+   return err;
+}
-- 
1.5.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/42] Unionfs: main Makefile

2007-12-09 Thread Erez Zadok
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/Makefile |   13 +
 1 files changed, 13 insertions(+), 0 deletions(-)
 create mode 100644 fs/unionfs/Makefile

diff --git a/fs/unionfs/Makefile b/fs/unionfs/Makefile
new file mode 100644
index 000..17ca4a7
--- /dev/null
+++ b/fs/unionfs/Makefile
@@ -0,0 +1,13 @@
+obj-$(CONFIG_UNION_FS) += unionfs.o
+
+unionfs-y := subr.o dentry.o file.o inode.o main.o super.o \
+   rdstate.o copyup.o dirhelper.o rename.o unlink.o \
+   lookup.o commonfops.o dirfops.o sioq.o mmap.o
+
+unionfs-$(CONFIG_UNION_FS_XATTR) += xattr.o
+
+unionfs-$(CONFIG_UNION_FS_DEBUG) += debug.o
+
+ifeq ($(CONFIG_UNION_FS_DEBUG),y)
+EXTRA_CFLAGS += -DDEBUG
+endif
-- 
1.5.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 28/42] Unionfs: async I/O queue operations

2007-12-09 Thread Erez Zadok
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/sioq.c |  119 +
 1 files changed, 119 insertions(+), 0 deletions(-)
 create mode 100644 fs/unionfs/sioq.c

diff --git a/fs/unionfs/sioq.c b/fs/unionfs/sioq.c
new file mode 100644
index 000..2a8c88e
--- /dev/null
+++ b/fs/unionfs/sioq.c
@@ -0,0 +1,119 @@
+/*
+ * Copyright (c) 2006-2007 Erez Zadok
+ * Copyright (c) 2006  Charles P. Wright
+ * Copyright (c) 2006-2007 Josef 'Jeff' Sipek
+ * Copyright (c) 2006  Junjiro Okajima
+ * Copyright (c) 2006  David P. Quigley
+ * Copyright (c) 2006-2007 Stony Brook University
+ * Copyright (c) 2006-2007 The Research Foundation of SUNY
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+/*
+ * Super-user IO work Queue - sometimes we need to perform actions which
+ * would fail due to the unix permissions on the parent directory (e.g.,
+ * rmdir a directory which appears empty, but in reality contains
+ * whiteouts).
+ */
+
+static struct workqueue_struct *superio_workqueue;
+
+int __init init_sioq(void)
+{
+   int err;
+
+   superio_workqueue = create_workqueue("unionfs_siod");
+   if (!IS_ERR(superio_workqueue))
+   return 0;
+
+   err = PTR_ERR(superio_workqueue);
+   printk(KERN_ERR "unionfs: create_workqueue failed %d\n", err);
+   superio_workqueue = NULL;
+   return err;
+}
+
+void stop_sioq(void)
+{
+   if (superio_workqueue)
+   destroy_workqueue(superio_workqueue);
+}
+
+void run_sioq(work_func_t func, struct sioq_args *args)
+{
+   INIT_WORK(>work, func);
+
+   init_completion(>comp);
+   while (!queue_work(superio_workqueue, >work)) {
+   /* TODO: do accounting if needed */
+   schedule();
+   }
+   wait_for_completion(>comp);
+}
+
+void __unionfs_create(struct work_struct *work)
+{
+   struct sioq_args *args = container_of(work, struct sioq_args, work);
+   struct create_args *c = >create;
+
+   args->err = vfs_create(c->parent, c->dentry, c->mode, c->nd);
+   complete(>comp);
+}
+
+void __unionfs_mkdir(struct work_struct *work)
+{
+   struct sioq_args *args = container_of(work, struct sioq_args, work);
+   struct mkdir_args *m = >mkdir;
+
+   args->err = vfs_mkdir(m->parent, m->dentry, m->mode);
+   complete(>comp);
+}
+
+void __unionfs_mknod(struct work_struct *work)
+{
+   struct sioq_args *args = container_of(work, struct sioq_args, work);
+   struct mknod_args *m = >mknod;
+
+   args->err = vfs_mknod(m->parent, m->dentry, m->mode, m->dev);
+   complete(>comp);
+}
+
+void __unionfs_symlink(struct work_struct *work)
+{
+   struct sioq_args *args = container_of(work, struct sioq_args, work);
+   struct symlink_args *s = >symlink;
+
+   args->err = vfs_symlink(s->parent, s->dentry, s->symbuf, s->mode);
+   complete(>comp);
+}
+
+void __unionfs_unlink(struct work_struct *work)
+{
+   struct sioq_args *args = container_of(work, struct sioq_args, work);
+   struct unlink_args *u = >unlink;
+
+   args->err = vfs_unlink(u->parent, u->dentry);
+   complete(>comp);
+}
+
+void __delete_whiteouts(struct work_struct *work)
+{
+   struct sioq_args *args = container_of(work, struct sioq_args, work);
+   struct deletewh_args *d = >deletewh;
+
+   args->err = do_delete_whiteouts(d->dentry, d->bindex, d->namelist);
+   complete(>comp);
+}
+
+void __is_opaque_dir(struct work_struct *work)
+{
+   struct sioq_args *args = container_of(work, struct sioq_args, work);
+
+   args->ret = lookup_one_len(UNIONFS_DIR_OPAQUE, args->is_opaque.dentry,
+  sizeof(UNIONFS_DIR_OPAQUE) - 1);
+   complete(>comp);
+}
-- 
1.5.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/42] Unionfs: usage documentation for users

2007-12-09 Thread Erez Zadok
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 Documentation/filesystems/unionfs/usage.txt |  115 +++
 1 files changed, 115 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/filesystems/unionfs/usage.txt

diff --git a/Documentation/filesystems/unionfs/usage.txt 
b/Documentation/filesystems/unionfs/usage.txt
new file mode 100644
index 000..a6b1aca
--- /dev/null
+++ b/Documentation/filesystems/unionfs/usage.txt
@@ -0,0 +1,115 @@
+Unionfs is a stackable unification file system, which can appear to merge
+the contents of several directories (branches), while keeping their physical
+content separate.  Unionfs is useful for unified source tree management,
+merged contents of split CD-ROM, merged separate software package
+directories, data grids, and more.  Unionfs allows any mix of read-only and
+read-write branches, as well as insertion and deletion of branches anywhere
+in the fan-out.  To maintain Unix semantics, Unionfs handles elimination of
+duplicates, partial-error conditions, and more.
+
+# mount -t unionfs -o branch-option[,union-options[,...]] none MOUNTPOINT
+
+The available branch-option for the mount command is:
+
+   dirs=branch[=ro|=rw][:...]
+
+specifies a separated list of which directories compose the union.
+Directories that come earlier in the list have a higher precedence than
+those which come later. Additionally, read-only or read-write permissions of
+the branch can be specified by appending =ro or =rw (default) to each
+directory.
+
+Syntax:
+
+   dirs=/branch1[=ro|=rw]:/branch2[=ro|=rw]:...:/branchN[=ro|=rw]
+
+Example:
+
+   dirs=/writable_branch=rw:/read-only_branch=ro
+
+
+DYNAMIC BRANCH MANAGEMENT AND REMOUNTS
+==
+
+You can remount a union and change its overall mode, or reconfigure the
+branches, as follows.
+
+To downgrade a union from read-write to read-only:
+
+# mount -t unionfs -o remount,ro none MOUNTPOINT
+
+To upgrade a union from read-only to read-write:
+
+# mount -t unionfs -o remount,rw none MOUNTPOINT
+
+To delete a branch /foo, regardless where it is in the current union:
+
+# mount -t unionfs -o remount,del=/foo none MOUNTPOINT
+
+To insert (add) a branch /foo before /bar:
+
+# mount -t unionfs -o remount,add=/bar:/foo none MOUNTPOINT
+
+To insert (add) a branch /foo (with the "rw" mode flag) before /bar:
+
+# mount -t unionfs -o remount,add=/bar:/foo=rw none MOUNTPOINT
+
+To insert (add) a branch /foo (in "rw" mode) at the very beginning (i.e., a
+new highest-priority branch), you can use the above syntax, or use a short
+hand version as follows:
+
+# mount -t unionfs -o remount,add=/foo none MOUNTPOINT
+
+To append a branch to the very end (new lowest-priority branch):
+
+# mount -t unionfs -o remount,add=:/foo none MOUNTPOINT
+
+To append a branch to the very end (new lowest-priority branch), in
+read-only mode:
+
+# mount -t unionfs -o remount,add=:/foo=ro none MOUNTPOINT
+
+Finally, to change the mode of one existing branch, say /foo, from read-only
+to read-write, and change /bar from read-write to read-only:
+
+# mount -t unionfs -o remount,mode=/foo=rw,mode=/bar=ro none MOUNTPOINT
+
+Note: in Unionfs 2.x, you cannot set the leftmost branch to readonly because
+then Unionfs won't have any writable place for copyups to take place.
+Moreover, the VFS can get confused when it tries to modify something in a
+file system mounted read-write, but isn't permitted to write to it.
+Instead, you should set the whole union as readonly, as described above.
+If, however, you must set the leftmost branch as readonly, perhaps so you
+can get a snapshot of it at a point in time, then you should insert a new
+writable top-level branch, and mark the one you want as readonly.  This can
+be accomplished as follows, assuming that /foo is your current leftmost
+branch:
+
+# mount -t tmpfs -o size=NNN /new
+# mount -t unionfs -o remount,add=/new,mode=/foo=ro none MOUNTPOINT
+
+# mount -t unionfs -o remount,del=/new,mode=/foo=rw none MOUNTPOINT
+
+# umount /new
+
+CACHE CONSISTENCY
+=
+
+If you modify any file on any of the lower branches directly, while there is
+a Unionfs 2.1 mounted above any of those branches, you should tell Unionfs
+to purge its caches and re-get the objects.  To do that, you have to
+increment the generation number of the superblock using the following
+command:
+
+# mount -t unionfs -o remount,incgen none MOUNTPOINT
+
+Note that the older way of incrementing the generation number using an
+ioctl, is no longer supported in Unionfs 2.0 and newer.  Ioctls in general
+are not encouraged.  Plus, an ioctl is per-file concept, whereas the
+generation number is a per-file-system concept.  Worse, such an ioctl
+requires an open file, which then has to be invalidated by the very nature
+of the generation number increase (read: the old generation increase ioctl
+was pretty racy).
+
+
+For more information, see .
-- 
1.5.2.2


[PATCH 22/42] Unionfs: unlink/rmdir operations

2007-12-09 Thread Erez Zadok
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/unlink.c |  236 +++
 1 files changed, 236 insertions(+), 0 deletions(-)
 create mode 100644 fs/unionfs/unlink.c

diff --git a/fs/unionfs/unlink.c b/fs/unionfs/unlink.c
new file mode 100644
index 000..423ff36
--- /dev/null
+++ b/fs/unionfs/unlink.c
@@ -0,0 +1,236 @@
+/*
+ * Copyright (c) 2003-2007 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2007 Stony Brook University
+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+/* unlink a file by creating a whiteout */
+static int unionfs_unlink_whiteout(struct inode *dir, struct dentry *dentry)
+{
+   struct dentry *lower_dentry;
+   struct dentry *lower_dir_dentry;
+   int bindex;
+   int err = 0;
+
+   err = unionfs_partial_lookup(dentry);
+   if (err)
+   goto out;
+
+   bindex = dbstart(dentry);
+
+   lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
+   if (!lower_dentry)
+   goto out;
+
+   lower_dir_dentry = lock_parent(lower_dentry);
+
+   /* avoid destroying the lower inode if the file is in use */
+   dget(lower_dentry);
+   err = is_robranch_super(dentry->d_sb, bindex);
+   if (!err)
+   err = vfs_unlink(lower_dir_dentry->d_inode, lower_dentry);
+   /* if vfs_unlink succeeded, update our inode's times */
+   if (!err)
+   unionfs_copy_attr_times(dentry->d_inode);
+   dput(lower_dentry);
+   fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode);
+   unlock_dir(lower_dir_dentry);
+
+   if (err && !IS_COPYUP_ERR(err))
+   goto out;
+
+   /*
+* We create whiteouts if (1) there was an error unlinking the main
+* file; (2) there is a lower priority file with the same name
+* (dbopaque); (3) the branch in which the file is not the last
+* (rightmost0 branch.  The last rule is an optimization to avoid
+* creating all those whiteouts if there's no chance they'd be
+* masking any lower-priority branch, as well as unionfs is used
+* with only one branch (using only one branch, while odd, is still
+* possible).
+*/
+   if (err) {
+   if (dbstart(dentry) == 0)
+   goto out;
+   err = create_whiteout(dentry, dbstart(dentry) - 1);
+   } else if (dbopaque(dentry) != -1) {
+   err = create_whiteout(dentry, dbopaque(dentry));
+   } else if (dbstart(dentry) < sbend(dentry->d_sb)) {
+   err = create_whiteout(dentry, dbstart(dentry));
+   }
+
+out:
+   if (!err)
+   dentry->d_inode->i_nlink--;
+
+   /* We don't want to leave negative leftover dentries for revalidate. */
+   if (!err && (dbopaque(dentry) != -1))
+   update_bstart(dentry);
+
+   return err;
+}
+
+int unionfs_unlink(struct inode *dir, struct dentry *dentry)
+{
+   int err = 0;
+
+   unionfs_read_lock(dentry->d_sb);
+   unionfs_lock_dentry(dentry);
+
+   if (unlikely(!__unionfs_d_revalidate_chain(dentry, NULL, false))) {
+   err = -ESTALE;
+   goto out;
+   }
+   unionfs_check_dentry(dentry);
+
+   err = unionfs_unlink_whiteout(dir, dentry);
+   /* call d_drop so the system "forgets" about us */
+   if (!err) {
+   if (!S_ISDIR(dentry->d_inode->i_mode))
+   unionfs_postcopyup_release(dentry);
+   d_drop(dentry);
+   /*
+* if unlink/whiteout succeeded, parent dir mtime has
+* changed
+*/
+   unionfs_copy_attr_times(dir);
+   }
+
+out:
+   if (!err) {
+   unionfs_check_dentry(dentry);
+   unionfs_check_inode(dir);
+   }
+   unionfs_unlock_dentry(dentry);
+   unionfs_read_unlock(dentry->d_sb);
+   return err;
+}
+
+static int unionfs_rmdir_first(struct inode *dir, struct dentry *dentry,
+  struct unionfs_dir_state *namelist)
+{
+   int err;
+   struct dentry *lower_dentry;
+   struct dentry *lower_dir_dentry = NULL;
+
+   /* Here we need to remove whiteout entries. */
+   err = delete_whiteouts(dentry, dbstart(dentry), namelist);
+   if (err)
+   goto out;
+
+   lower_dentry = 

[PATCH 27/42] Unionfs: async I/O queue headers

2007-12-09 Thread Erez Zadok
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/sioq.h |   92 +
 1 files changed, 92 insertions(+), 0 deletions(-)
 create mode 100644 fs/unionfs/sioq.h

diff --git a/fs/unionfs/sioq.h b/fs/unionfs/sioq.h
new file mode 100644
index 000..afb71ee
--- /dev/null
+++ b/fs/unionfs/sioq.h
@@ -0,0 +1,92 @@
+/*
+ * Copyright (c) 2006-2007 Erez Zadok
+ * Copyright (c) 2006  Charles P. Wright
+ * Copyright (c) 2006-2007 Josef 'Jeff' Sipek
+ * Copyright (c) 2006  Junjiro Okajima
+ * Copyright (c) 2006  David P. Quigley
+ * Copyright (c) 2006-2007 Stony Brook University
+ * Copyright (c) 2006-2007 The Research Foundation of SUNY
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _SIOQ_H
+#define _SIOQ_H
+
+struct deletewh_args {
+   struct unionfs_dir_state *namelist;
+   struct dentry *dentry;
+   int bindex;
+};
+
+struct is_opaque_args {
+   struct dentry *dentry;
+};
+
+struct create_args {
+   struct inode *parent;
+   struct dentry *dentry;
+   umode_t mode;
+   struct nameidata *nd;
+};
+
+struct mkdir_args {
+   struct inode *parent;
+   struct dentry *dentry;
+   umode_t mode;
+};
+
+struct mknod_args {
+   struct inode *parent;
+   struct dentry *dentry;
+   umode_t mode;
+   dev_t dev;
+};
+
+struct symlink_args {
+   struct inode *parent;
+   struct dentry *dentry;
+   char *symbuf;
+   umode_t mode;
+};
+
+struct unlink_args {
+   struct inode *parent;
+   struct dentry *dentry;
+};
+
+
+struct sioq_args {
+   struct completion comp;
+   struct work_struct work;
+   int err;
+   void *ret;
+
+   union {
+   struct deletewh_args deletewh;
+   struct is_opaque_args is_opaque;
+   struct create_args create;
+   struct mkdir_args mkdir;
+   struct mknod_args mknod;
+   struct symlink_args symlink;
+   struct unlink_args unlink;
+   };
+};
+
+/* Extern definitions for SIOQ functions */
+extern int __init init_sioq(void);
+extern void stop_sioq(void);
+extern void run_sioq(work_func_t func, struct sioq_args *args);
+
+/* Extern definitions for our privilege escalation helpers */
+extern void __unionfs_create(struct work_struct *work);
+extern void __unionfs_mkdir(struct work_struct *work);
+extern void __unionfs_mknod(struct work_struct *work);
+extern void __unionfs_symlink(struct work_struct *work);
+extern void __unionfs_unlink(struct work_struct *work);
+extern void __delete_whiteouts(struct work_struct *work);
+extern void __is_opaque_dir(struct work_struct *work);
+
+#endif /* not _SIOQ_H */
-- 
1.5.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/42] Makefile: hook to compile unionfs

2007-12-09 Thread Erez Zadok
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/Makefile |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/Makefile b/fs/Makefile
index 500cf15..e202288 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -118,3 +118,4 @@ obj-$(CONFIG_HPPFS) += hppfs/
 obj-$(CONFIG_DEBUG_FS) += debugfs/
 obj-$(CONFIG_OCFS2_FS) += ocfs2/
 obj-$(CONFIG_GFS2_FS)   += gfs2/
+obj-$(CONFIG_UNION_FS) += unionfs/
-- 
1.5.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/42] Unionfs: documentation for general concepts

2007-12-09 Thread Erez Zadok
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 Documentation/filesystems/unionfs/concepts.txt |  199 
 1 files changed, 199 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/filesystems/unionfs/concepts.txt

diff --git a/Documentation/filesystems/unionfs/concepts.txt 
b/Documentation/filesystems/unionfs/concepts.txt
new file mode 100644
index 000..7654ccc
--- /dev/null
+++ b/Documentation/filesystems/unionfs/concepts.txt
@@ -0,0 +1,199 @@
+Unionfs 2.1 CONCEPTS:
+=
+
+This file describes the concepts needed by a namespace unification file
+system.
+
+
+Branch Priority:
+
+
+Each branch is assigned a unique priority - starting from 0 (highest
+priority).  No two branches can have the same priority.
+
+
+Branch Mode:
+
+
+Each branch is assigned a mode - read-write or read-only. This allows
+directories on media mounted read-write to be used in a read-only manner.
+
+
+Whiteouts:
+==
+
+A whiteout removes a file name from the namespace. Whiteouts are needed when
+one attempts to remove a file on a read-only branch.
+
+Suppose we have a two-branch union, where branch 0 is read-write and branch
+1 is read-only. And a file 'foo' on branch 1:
+
+./b0/
+./b1/
+./b1/foo
+
+The unified view would simply be:
+
+./union/
+./union/foo
+
+Since 'foo' is stored on a read-only branch, it cannot be removed. A
+whiteout is used to remove the name 'foo' from the unified namespace. Again,
+since branch 1 is read-only, the whiteout cannot be created there. So, we
+try on a higher priority (lower numerically) branch and create the whiteout
+there.
+
+./b0/
+./b0/.wh.foo
+./b1/
+./b1/foo
+
+Later, when Unionfs traverses branches (due to lookup or readdir), it
+eliminate 'foo' from the namespace (as well as the whiteout itself.)
+
+
+Duplicate Elimination:
+==
+
+It is possible for files on different branches to have the same name.
+Unionfs then has to select which instance of the file to show to the user.
+Given the fact that each branch has a priority associated with it, the
+simplest solution is to take the instance from the highest priority
+(numerically lowest value) and "hide" the others.
+
+
+Copyup:
+===
+
+When a change is made to the contents of a file's data or meta-data, they
+have to be stored somewhere. The best way is to create a copy of the
+original file on a branch that is writable, and then redirect the write
+though to this copy. The copy must be made on a higher priority branch so
+that lookup and readdir return this newer "version" of the file rather than
+the original (see duplicate elimination).
+
+
+Cache Coherency:
+
+
+Unionfs users often want to be able to modify files and directories directly
+on the lower branches, and have those changes be visible at the Unionfs
+level.  This means that data (e.g., pages) and meta-data (dentries, inodes,
+open files, etc.) have to be synchronized between the upper and lower
+layers.  In other words, the newest changes from a layer below have to be
+propagated to the Unionfs layer above.  If the two layers are not in sync, a
+cache incoherency ensues, which could lead to application failures and even
+oopses.  The Linux kernel, however, has a rather limited set of mechanisms
+to ensure this inter-layer cache coherency---so Unionfs has to do most of
+the hard work on its own.
+
+Maintaining Invariants:
+
+The way Unionfs ensures cache coherency is as follows.  At each entry point
+to a Unionfs file system method, we call a utility function to validate the
+primary objects of this method.  Generally, we call unionfs_file_revalidate
+on open files, and __unionfs_d_revalidate_chain on dentries (which also
+validates inodes).  These utility functions check to see whether the upper
+Unionfs object is in sync with any of the lower objects that it represents.
+The checks we perform include whether the Unionfs superblock has a newer
+generation number, or if any of the lower objects mtime's or ctime's are
+newer.  (Note: generation numbers change when branch-management commands are
+issued, so in a way, maintaining cache coherency is also very important for
+branch-management.)  If indeed we determine that any Unionfs object is no
+longer in sync with its lower counterparts, then we rebuild that object
+similarly to how we do so for branch-management.
+
+While rebuilding Unionfs's objects, we also purge any page mappings and
+truncate inode pages (see fs/unionfs/dentry.c:purge_inode_data).  This is to
+ensure that Unionfs will re-get the newer data from the lower branches.  We
+perform this purging only if the Unionfs operation in question is a reading
+operation; if Unionfs is performing a data writing operation (e.g., ->write,
+->commit_write, etc.) then we do NOT flush the lower mappings/pages: this is
+because (1) a self-deadlock could occur and (2) the upper Unionfs pages are
+considered more authoritative anyway, as they are 

[PATCH 07/42] Unionfs maintainers

2007-12-09 Thread Erez Zadok
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 MAINTAINERS |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index f3d7256..95f16f0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3805,6 +3805,15 @@ L:   linux-kernel@vger.kernel.org
 W: http://www.kernel.dk
 S: Maintained
 
+UNIONFS
+P: Erez Zadok
+M: [EMAIL PROTECTED]
+P: Josef "Jeff" Sipek
+M: [EMAIL PROTECTED]
+L: [EMAIL PROTECTED]
+W: http://unionfs.filesystems.org
+S: Maintained
+
 USB ACM DRIVER
 P: Oliver Neukum
 M: [EMAIL PROTECTED]
-- 
1.5.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/42] Unionfs: documentation for any known issues

2007-12-09 Thread Erez Zadok
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 Documentation/filesystems/unionfs/issues.txt |   24 
 1 files changed, 24 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/filesystems/unionfs/issues.txt

diff --git a/Documentation/filesystems/unionfs/issues.txt 
b/Documentation/filesystems/unionfs/issues.txt
new file mode 100644
index 000..9db1d70
--- /dev/null
+++ b/Documentation/filesystems/unionfs/issues.txt
@@ -0,0 +1,24 @@
+KNOWN Unionfs 2.1 ISSUES:
+=
+
+1. Unionfs should not use lookup_one_len() on the underlying f/s as it
+   confuses NFSv4.  Currently, unionfs_lookup() passes lookup intents to the
+   lower file-system, this eliminates part of the problem.  The remaining
+   calls to lookup_one_len may need to be changed to pass an intent.  We are
+   currently introducing VFS changes to fs/namei.c's do_path_lookup() to
+   allow proper file lookup and opening in stackable file systems.
+
+2. Lockdep (a debugging feature) isn't aware of stacking, and so it
+   incorrectly complains about locking problems.  The problem boils down to
+   this: Lockdep considers all objects of a certain type to be in the same
+   class, for example, all inodes.  Lockdep doesn't like to see a lock held
+   on two inodes within the same task, and warns that it could lead to a
+   deadlock.  However, stackable file systems do precisely that: they lock
+   an upper object, and then a lower object, in a strict order to avoid
+   locking problems; in addition, Unionfs, as a fan-out file system, may
+   have to lock several lower inodes.  We are currently looking into Lockdep
+   to see how to make it aware of stackable file systems.  In the meantime,
+   if you get any warnings from Lockdep, you can safely ignore them (or feel
+   free to report them to the Unionfs maintainers, just to be sure).
+
+For more information, see .
-- 
1.5.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/42] Unionfs: unionfs documentation index

2007-12-09 Thread Erez Zadok
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 Documentation/filesystems/unionfs/00-INDEX |   10 ++
 1 files changed, 10 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/filesystems/unionfs/00-INDEX

diff --git a/Documentation/filesystems/unionfs/00-INDEX 
b/Documentation/filesystems/unionfs/00-INDEX
new file mode 100644
index 000..96fdf67
--- /dev/null
+++ b/Documentation/filesystems/unionfs/00-INDEX
@@ -0,0 +1,10 @@
+00-INDEX
+   - this file.
+concepts.txt
+   - A brief introduction of concepts.
+issues.txt
+   - A summary of known issues with unionfs.
+rename.txt
+   - Information regarding rename operations.
+usage.txt
+   - Usage information and examples.
-- 
1.5.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 18/42] Unionfs: directory reading file operations

2007-12-09 Thread Erez Zadok
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/dirfops.c |  290 ++
 1 files changed, 290 insertions(+), 0 deletions(-)
 create mode 100644 fs/unionfs/dirfops.c

diff --git a/fs/unionfs/dirfops.c b/fs/unionfs/dirfops.c
new file mode 100644
index 000..88df635
--- /dev/null
+++ b/fs/unionfs/dirfops.c
@@ -0,0 +1,290 @@
+/*
+ * Copyright (c) 2003-2007 Erez Zadok
+ * Copyright (c) 2003-2006 Charles P. Wright
+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
+ * Copyright (c) 2005-2006 Junjiro Okajima
+ * Copyright (c) 2005  Arun M. Krishnakumar
+ * Copyright (c) 2004-2006 David P. Quigley
+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
+ * Copyright (c) 2003  Puja Gupta
+ * Copyright (c) 2003  Harikesavan Krishnan
+ * Copyright (c) 2003-2007 Stony Brook University
+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "union.h"
+
+/* Make sure our rdstate is playing by the rules. */
+static void verify_rdstate_offset(struct unionfs_dir_state *rdstate)
+{
+   BUG_ON(rdstate->offset >= DIREOF);
+   BUG_ON(rdstate->cookie >= MAXRDCOOKIE);
+}
+
+struct unionfs_getdents_callback {
+   struct unionfs_dir_state *rdstate;
+   void *dirent;
+   int entries_written;
+   int filldir_called;
+   int filldir_error;
+   filldir_t filldir;
+   struct super_block *sb;
+};
+
+/* based on generic filldir in fs/readir.c */
+static int unionfs_filldir(void *dirent, const char *name, int namelen,
+  loff_t offset, u64 ino, unsigned int d_type)
+{
+   struct unionfs_getdents_callback *buf = dirent;
+   struct filldir_node *found = NULL;
+   int err = 0;
+   int is_wh_entry = 0;
+
+   buf->filldir_called++;
+
+   if ((namelen > UNIONFS_WHLEN) &&
+   !strncmp(name, UNIONFS_WHPFX, UNIONFS_WHLEN)) {
+   name += UNIONFS_WHLEN;
+   namelen -= UNIONFS_WHLEN;
+   is_wh_entry = 1;
+   }
+
+   found = find_filldir_node(buf->rdstate, name, namelen, is_wh_entry);
+
+   if (found) {
+   /*
+* If we had non-whiteout entry in dir cache, then mark it
+* as a whiteout and but leave it in the dir cache.
+*/
+   if (is_wh_entry && !found->whiteout)
+   found->whiteout = is_wh_entry;
+   goto out;
+   }
+
+   /* if 'name' isn't a whiteout, filldir it. */
+   if (!is_wh_entry) {
+   off_t pos = rdstate2offset(buf->rdstate);
+   u64 unionfs_ino = ino;
+
+   err = buf->filldir(buf->dirent, name, namelen, pos,
+  unionfs_ino, d_type);
+   buf->rdstate->offset++;
+   verify_rdstate_offset(buf->rdstate);
+   }
+   /*
+* If we did fill it, stuff it in our hash, otherwise return an
+* error.
+*/
+   if (err) {
+   buf->filldir_error = err;
+   goto out;
+   }
+   buf->entries_written++;
+   err = add_filldir_node(buf->rdstate, name, namelen,
+  buf->rdstate->bindex, is_wh_entry);
+   if (err)
+   buf->filldir_error = err;
+
+out:
+   return err;
+}
+
+static int unionfs_readdir(struct file *file, void *dirent, filldir_t filldir)
+{
+   int err = 0;
+   struct file *lower_file = NULL;
+   struct inode *inode = NULL;
+   struct unionfs_getdents_callback buf;
+   struct unionfs_dir_state *uds;
+   int bend;
+   loff_t offset;
+
+   unionfs_read_lock(file->f_path.dentry->d_sb);
+
+   err = unionfs_file_revalidate(file, false);
+   if (unlikely(err))
+   goto out;
+
+   inode = file->f_path.dentry->d_inode;
+
+   uds = UNIONFS_F(file)->rdstate;
+   if (!uds) {
+   if (file->f_pos == DIREOF) {
+   goto out;
+   } else if (file->f_pos > 0) {
+   uds = find_rdstate(inode, file->f_pos);
+   if (unlikely(!uds)) {
+   err = -ESTALE;
+   goto out;
+   }
+   UNIONFS_F(file)->rdstate = uds;
+   } else {
+   init_rdstate(file);
+   uds = UNIONFS_F(file)->rdstate;
+   }
+   }
+   bend = fbend(file);
+
+   while (uds->bindex <= bend) {
+   lower_file = unionfs_lower_file_idx(file, uds->bindex);
+   if (!lower_file) {
+   uds->bindex++;
+   uds->dirpos = 0;
+   continue;
+   }
+
+   /* prepare callback buffer 

[PATCH 01/42] Unionfs: filesystems documentation index

2007-12-09 Thread Erez Zadok
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 Documentation/filesystems/00-INDEX |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/Documentation/filesystems/00-INDEX 
b/Documentation/filesystems/00-INDEX
index 1de155e..b168331 100644
--- a/Documentation/filesystems/00-INDEX
+++ b/Documentation/filesystems/00-INDEX
@@ -96,6 +96,8 @@ udf.txt
- info and mount options for the UDF filesystem.
 ufs.txt
- info on the ufs filesystem.
+unionfs/
+   - info on the unionfs filesystem
 vfat.txt
- info on using the VFAT filesystem used in Windows NT and Windows 95
 vfs.txt
-- 
1.5.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[UNIONFS] 00/42 Unionfs and related patches review

2007-12-09 Thread Erez Zadok

Al, Christoph, and Andrew,

As per your request, I'm posting for review the unionfs code (and related
code) that's in my korg tree against mainline (v2.6.24-rc4-190-g94545ba).
This code is nearly identical to what's in -mm (the mm code has a couple of
additional things that depend on mm-specific patches that aren't in mainline
yet).

I really tried to keep this message short, by offering pointers to more
info, but still there's a bunch of info here.

Andrew, you've asked me to list the main issues that came about in
discussions regarding unionfs, and how were they addressed.  So I've
reviewed my notes from OLS'06, LSF'07, and OLS'07, as well as assorted
postings in mailing lists, and I came up with this prioritized list (in
descending priority order):

1. cache coherency
2. nameidata handling
3. namespace pollution
4. use of ioctls for branch management

(1) Cache coherency: by far, the biggest concern had been around cache
coherency: what happens if someone modifies a lower object
(file/dir/etc.).  I met with Mike Halcrow in October and we discussed
stacking in general; Mike also emphasized that cache-coherency was one
of his most pressing concerns in ecryptfs.

At OLS'06, several suggestions were made, including fancy tricks to hide the
lower namespace or "lock" it so users have readonly access.  None of these
solutions would have been able to easily handle the problem of an existing
open file descriptor on a lower file, and they might have required
significant VFS changes.  Moreover, unionfs users actually want to modify
lower branches directly, and then be able to see their changes reflected in
the union immediately.  So we explored a number of ideas.  We feel that the
VFS is complex enough so we tried our best to handle cache-coherency inside
unionfs.  The solution we have implemented is to compare the mtime/ctime of
upper/lower objects during revalidation (esp. of dentries); and if the lower
times are newer, we reconstruct the union object (drop the older objects,
and re-lookup them).  This time-based cache-coherency works well and is
similar to the NFS model.  Because Unionfs users tend to have a burst of
activity on lower branches, our current cache-coherency also defers the
revalidation actions until absolutely needed, so this idea tends to also be
more efficient for the common usage patterns.  More details about how we
handle cache-coherency are available in our
Documentation/filesystems/unionfs/concepts.txt file.

That said, we're now developing some VFS patches that would allow lower file
systems to more directly inform the upper objects about such (mtime)
changes.  We're exploring a couple of different options but our key goals
are to (a) minimize VFS changes and (b) avoid any changes to lower file
systems.

(2) nameidata handling.  Another important question raised (esp. by NFS
people) was how we handle struct nameidata.  The VFS passes nameidata
structs to file systems, and some file systems use that.  We used to
either pass NULL or the upper nd to the lower f/s.  That caused NULL
de-refs inside nfsv4, among other problems.  We now create our own
nameidata structure, fill it up as needed (esp. for intent data), and
pass it down.  We do this every time we call any VFS function that takes
a nameidata (e.g., vfs_create).  This seems to work well.

There's been some discussion on lkml about splitting struct nameidata in
two, one of which would handle just the intent information.  I'd like to see
that happen, maybe even help, because right now we pass a whole large-ish
struct nameidata for just a couple of intent bits of information that the
lower f/s needs.

(3) namespace pollution.  Unioning readonly and readwrite directories
requires the ability to mask, or white-out, files that are being deleted
from a readonly directory.  Unionfs does this in a portable way, by
creating .wh.XXX files to indicate that file XXX has been whited-out.
This works well on many file systems, but it tends to clutter lower
branches with these .wh.* files.  We recently optimized our whiteout
creation algorithm so it minimizes the number of conditions in which
whiteouts are created, and that helped some people a lot.  But still, if
you unify a readonly and writeable branch, and you try to delete a file
from the readonly branch/medium, there's no way to avoid creating some
sort of a whiteout.  BTW, of course, these whiteouts are completely
hidden from the view of the user who accesses files/dirs via the union.

In the long run, we really hope to see native whiteout support in Linux (ala
BSD).  Of course, this would require a change to the VFS and several native
file systems (possibly even a change to the on-disk format), so we realize
that this isn't likely to happen soon.  If/when native whiteout support was
available, unionfs could easily use it.  Until that time, we have lots of
users who want to use unionfs on top of numerous 

Re: [lm-sensors] 2.6.24-rc4 hwmon it87 probe fails

2007-12-09 Thread Shaohua Li

On Sun, 2007-12-09 at 23:04 +0100, Adrian Bunk wrote:
> On Sun, Dec 09, 2007 at 04:12:25PM -0500, Elvis Pranskevichus wrote:
> > Jean Delvare wrote:
> > 
> > > Hi Mike,
> > > 
> > > On Sat, 8 Dec 2007 21:22:34 -0500, Mike Houston wrote:
> > >> On Sun, 9 Dec 2007 01:05:54 +0100
> > >> Adrian Bunk <[EMAIL PROTECTED]> wrote:
> > >> 
> > >> > On Tue, Dec 04, 2007 at 09:51:54PM -0500, Mike Houston wrote:
> > >> > > I finally got around to testing Linux 2.6.24 (2.6.24-rc4) and
> > >> > > found that the it87 driver fails to probe and consequently, my
> > >> > > sensors no longer work. This was fine with Linux 2.6.23.8 (the
> > >> > > last kernel I was using)
> > >> > > 
> > >> > > The necessary modules load, but:
> > >> > > 
> > >> > > it87: Found IT8718F chip at 0x290, revision 2
> > >> > > it87: in3 is VCC (+5V)
> > >> > > it87 it87.656: Failed to request region 0x290-0x297
> > >> > > it87: probe of it87.656 failed with error -16
> > >> > > 
> > >> > > Coretemp still works.
> > >> > > 
> > >> > > It appears it has something to do with the ioport range being
> > >> > > reserved for some reason:
> > >> > > 
> > >> > > system 00:01: ioport range 0x290-0x29f has been reserved
> > >> 
> > >> > 
> > >> > Thanks for your report.
> > >> > 
> > >> > Please also provide:
> > >> > - dmesg from 2.6.23.8
> > >> > - The output of "cat /proc/ioports" for both kernels
> > >> 
> > >> Thanks Adrian, here is the information you have requested, for
> > >> both kernels (I have 2.6.23.9 now though where it87 still works)
> > >> 
> > >> Linux 2.6.23.9:
> > >> http://www.mikeserv.com/temp/proc_ioports-2.6.23.9.txt
> > >> http://www.mikeserv.com/temp/dmesg-2.6.23.9.txt
> > >> http://www.mikeserv.com/temp/config-2.6.23.9.txt
> > >> 
> > >> Linux 2.6.24-rc4:
> > >> http://www.mikeserv.com/temp/proc_ioports-2.6.24-rc4.txt
> > >> http://www.mikeserv.com/temp/dmesg-2.6.24-rc4.txt
> > > 
> > > This one shows:
> > > 
> > > system 00:01: ioport range 0x290-0x29f has been reserved
> > > (...)
> > > system 00:01: ioport range 0x290-0x294 has been reserved
> > > 
> > > This is clearly not correct as both areas overlap. The second
> > > reservation is responsible for the it87 breakage, because it conflicts
> > > with what the it87 driver later attempts to request (0x290-0x297). The
> > > first is wrong as well (the IT87xxF environment controller I/O area is
> > > 8 port wide, not 16) but shouldn't be a problem in practice.
> > > 
> > > These port reservations weren't happening in 2.6.23.9 according to your
> > > dmesg output for that kernel. I don't know what changed in this area
> > > since 2.6.23.9, maybe Bjorn or Adam (Cc'd) can tell.
> > > 
> > 
> > Hi,
> > 
> > I have exactly the same problem here on a Gigabyte GA-965G-DS3 motherboard
> > based box:
> > 
> > it87: Found IT8718F chip at 0x290, revision 1
> > it87: in3 is VCC (+5V)
> > it87 it87.656: Failed to request region 0x290-0x297
> > it87: probe of it87.656 failed with error -16
> > 
> > git bisecting revealed the offending commit:
> > 
> > a7839e960675b54: PNP: increase the maximum number of resources
> > 
> > Happened between rc3 and rc4.
> 
> Thanks for doing the work of bisecting!
> 
> > > Either way, the overlapping areas smell like a BIOS bug, meaning that
> > > you should look for an updated BIOS for your system first.
> > > 
> > >> http://www.mikeserv.com/temp/config-2.6.24-rc4.txt
> > > 
> > 
> > This indeed looks like a broken ACPI BIOS since the aforementioned commit
> > touches only the PNP ACPI driver. I'm not sure how to work around this,
> > though. Ideas?
> 
> People responsible for this commit + ACPI maintainer added to Cc.
This should exist in previous kernel (before we remove acpi motherboard
driver) too. Basically it's a broken BIOS. Could below patch work around
it?

Thanks,
Shaohua

Index: linux/drivers/pnp/system.c
===
--- linux.orig/drivers/pnp/system.c 2007-12-10 10:17:46.0 +0800
+++ linux/drivers/pnp/system.c  2007-12-10 10:24:42.0 +0800
@@ -22,7 +22,7 @@ static const struct pnp_device_id pnp_de
{"", 0}
 };
 
-static void reserve_range(struct pnp_dev *dev, resource_size_t start,
+static struct resource* reserve_range(struct pnp_dev *dev, resource_size_t 
start,
  resource_size_t end, int port)
 {
char *regionid;
@@ -31,16 +31,14 @@ static void reserve_range(struct pnp_dev
 
regionid = kmalloc(16, GFP_KERNEL);
if (!regionid)
-   return;
+   return NULL;
 
snprintf(regionid, 16, "pnp %s", pnpid);
if (port)
res = request_region(start, end - start + 1, regionid);
else
res = request_mem_region(start, end - start + 1, regionid);
-   if (res)
-   res->flags &= ~IORESOURCE_BUSY;
-   else
+   if (!res)
kfree(regionid);
 
/*
@@ -52,12 +50,17 @@ static void reserve_range(struct pnp_dev
port ? 

Re: [PATCH 3/3] Fix use of skb after netif_rx

2007-12-09 Thread Wang Chen
Julia Lawall said the following on 2007-12-10 4:05:
> From: Julia Lawall <[EMAIL PROTECTED]>
> // 
> @@
> expression skb, e,e1;
> @@
> 
> (
>  netif_rx(skb);
> |
>  netif_rx_ni(skb);
> )
>   ... when != skb = e
> (
>   skb = e1
> |
> * skb
> )
> // 
> 
> diff a/drivers/s390/net/ctcmain.c b/drivers/s390/net/ctcmain.c
> diff a/drivers/s390/net/netiucv.c b/drivers/s390/net/netiucv.c

Julia, seems that your semantic patch misses following place.

drivers/s390/net/qeth_main.c:2733
...
#endif
rxrc = netif_rx(skb);
card->dev->last_rx = jiffies;
card->stats.rx_packets++;
card->stats.rx_bytes += skb->len;
...

--
WCN

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23

2007-12-09 Thread Tejun Heo
Robert Hancock wrote:
> And you're quite right in your comment that we are often too quick to
> blacklist hardware instead of looking into why it really is failing.
> ACPI is one of those areas where we often just need to figure out how to
> be bug-to-bug compatibile with what Windows is doing..

In the spirit of not blacklisting without looking deep into ACPI code,
can somebody familiar with ASL take a look at comment 11 of bug 9320?

  http://bugzilla.kernel.org/show_bug.cgi?id=9320#c11

This is libata calling _GTM to find out how the BIOS configured the
device to determine cable type.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23

2007-12-09 Thread Tejun Heo
Andreas Mohr wrote:
> As such one can conclude that this BIOS is rather very confused when being 
> called for _GTM on an entirely
> unused controller port. And this is either because the BIOS is dumb or 
> because ACPI doesn't really
> expect anyone to call _GTM on an unused physical port. I'd bet on the 
> latter...
> (however I haven't found ACPI 3.0b explicitly mentioning this somewhere yet)

Thanks a lot for finding this out.  One of the two reports in bug 9320
seems to be the same problem although the other doesn't seem to be.  So,
it seems we'll have to check that both primary and secondary slots are
empty and skip _GTM if so.  :-(

Also, right, there's no need to fail suspend on _GTM failure whatever
the error is.  That was me being anal again.  Will incorporate both into
the ACPI fixes patchset.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


kernel logger in lower than audit or klogd

2007-12-09 Thread Mohsen Pahlevanzadeh

Dear all,
I wanna test an IDS ,So i need to a kernel logger lower than audit or klogd.
I need to a kernel logger that lower than syscalls.
Please help me.
Cheers,
--
-
Mohsen Pahlevanzadeh
email address : [EMAIL PROTECTED]
web site : http://pahlevanzadeh.org
IRC IM : m_pahlevanzadeh
yahoo IM : linuxorbsd


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PROBLEM: /proc/cpuinfo reports erroneous CPU frequency. (2.6.23 & 2.6.22)

2007-12-09 Thread Alexander Rajula

While using 2.6.23 and 2.6.22 (earlier kernels have not been tested) 
/proc/cpuinfo reports the wrong CPU frequency:


While overclocking an AMD Athlon X2 (2GHz) CPU /proc/cpuinfo reports the wrong 
CPU frequency.
I am quite puzzled by this.
Is this an error in the kernel, or is there something strange going on?


Output from /proc/cpuinfo (processor at 3.0GHz in BIOS):

processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 75
model name  : AMD Athlon(tm) 64 X2 Dual Core Processor 3800+
stepping: 2
cpu MHz : 2300.043
cache size  : 512 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 
3dnowext 3dnow rep_good pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips: 4602.12
TLB size: 1024 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc

processor   : 1
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 75
model name  : AMD Athlon(tm) 64 X2 Dual Core Processor 3800+
stepping: 2
cpu MHz : 2300.043
cache size  : 512 KB
physical id : 0
siblings: 2
core id : 1
cpu cores   : 2
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 
3dnowext 3dnow rep_good pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips: 4599.31
TLB size: 1024 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm st



Output from scripts/ver_linux:

Linux dubbelmacka 2.6.23-gentoo-r3 #3 SMP PREEMPT Mon Dec 10 00:46:09 CET 2007 x
86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ AuthenticAMD GNU/Linux

Gnu C  4.1.2
Gnu make   3.81
binutils   Binutils
util-linux 2.12r
mount  2.12r
module-init-tools  3.2.2
e2fsprogs  1.39
reiserfsprogs  3.6.19
reiser4progs   1.0.5
PPP2.4.4
Linux C Library2.5
Dynamic linker (ldd)   2.5
Procps 3.2.7
Net-tools  1.60
Kbd1.13
Sh-utils   6.9
udev   115
Modules Loaded nvidia

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23

2007-12-09 Thread Linus Torvalds


On Sun, 9 Dec 2007, Alan Cox wrote:
> 
> The one off regression is probably not one off, but this is IDE so
> actually its quite probable its a single broken firmware. 
>
> The alternative is that you cripple just about every user of various
> other standards compliant devices and controllers whose hardware we
> finally fixed.

Alan, you're so full of shit that it's not even funny.

Have you even *read* the thread?

Tejun already reported that this apparently gets fixed _properly_ with the 
more extensive cleanups and fixes that are pending for 2.6.25.

In other words, the stuff you call so critically important (yet we've been 
able to live without it until now!) is apparently simply NOT YET READY. 
It's breaking things.

In this case, Tejun seems to be right on the money.  I also agree 100% 
with him when he says

   "Blacklist takes time to develop and temporary blacklist for just one
release doesn't sound like a good idea."

because if we create some blacklist for that one reported device, not only 
is it likely going to be wrong (it's almost never just one firmware or one 
chip that has a particular issue), but we tend to create thee blacklists 
and later realize that we shouldn't have blacklisted things at all, we 
should just have done things differently.

For examples of that, see the NCQ blacklist that was just _us_ doing 
things wrong (over-reacting to things we shouldn't care about), and 
there's currently another totally unrelated discussion on a very similar 
thing wrt libata and the ACPI startup commands for an unused controller 
port.

> Finally you need to remember that the 'regression' is caused by the fact
> we now do the _right_ thing both in terms of 'old IDE' and specs.

.. and what the hell does that matter? If the code doesn't work, it 
doesn't work, and you might as well point to some random scribblings done 
by a three-year-old on toilet paper rather than any "specs".

Real life matters more. Regressions matter more.

We apparently do have a full fix, but it seems to be too invasive for 
2.6.24, which means that the thing that currently DOES NOT WORK and 
causes regressions should be reverted, so that 2.6.24 is at least no worse 
than 2.6.23 (and all earlier kernels) in this respect.

And then we should just hope that the more complete fix that Tejun has 
doesn't cause any issues on its own. I would suggest that if you care so 
deeply about this issue, you press Fedora into putting Tejun's tree into 
Fedora testing, and get that thing tested out extensively.

So the fact is, we have a way forward, but we should *not* take steps 
backwards just because you want to push something out that isn't quite 
ready. We should revert the change that causes the current trouble, safe 
in the knowledge (or at least "strong hope") that we have a way forward 
that makes *both* 2.6.24 and 2.6.25 be continual improvements.

We used to allow regressions. It was really painful. It's hard to debug 
things when things sometimes break. It's much better to have a nice 
constant monotonic improvement.

It's better for users, but it's much better also for developers, even if 
you may be frustrated right now because some new code effectively gets 
shut down until it works for everybody.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PS3: trouble with SPARSEMEM_VMEMMAP and kexec

2007-12-09 Thread Yasunori Goto
> Yasunori Goto wrote:
> >> On Thu, 6 Dec 2007, Geert Uytterhoeven wrote:
> >> > On Thu, 6 Dec 2007, Yasunori Goto wrote:
> >> > > > I'll try Milton's suggestion to pre-allocate the memory early.  It 
> >> > > > seems
> >> > > > that should work as long as nothing else before the hot-plug mem is 
> >> > > > added
> >> > > > needs a large chunk.
> >> > > 
> >> > > Hello. Geoff-san. Sorry for late response.
> >> > > 
> >> > > Could you tell me the value of the following page_size calculation
> >> > > in vmemmap_populate()? I think this page_size may be too big value. 
> >> > > 
> >> > > --
> >> > > int __meminit vmemmap_populate(struct page *start_page,
> >> > >unsigned long nr_pages, int 
> >> > > node)
> >> > >:
> >> > >:
> >> > > unsigned long page_size = 1 << 
> >> > > mmu_psize_defs[mmu_linear_psize].shift;
> >> > >:
> >> > > ---
> >> 
> >> 16 MiB of course.
> > 
> > 16 MiB is not page size. It is "section size". 
> > IIRC, powerpc's page size must be 4K (or 64K).
> > If page size is 4k, vmemmap_alloc_block will call the order 12 page.
> 
> 
> By default PS3 uses 4K virtual pages, and 16M linear pages.
> 
> 
> > Is it really correct value for vmemmap population?
> 
> 
> It seems vmemmap needs linear pages, so I think it is ok.

Oh, I see. Sorry for noise.

Bye.

-- 
Yasunori Goto 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lm-sensors] 2.6.24-rc4 hwmon it87 probe fails

2007-12-09 Thread Ed Sweetman

Mike Houston wrote:

On Sun, 9 Dec 2007 23:42:15 +0100
Jean Delvare <[EMAIL PROTECTED]> wrote:

  

On Sun, 09 Dec 2007 16:12:25 -0500, Elvis Pranskevichus wrote:



  

This indeed looks like a broken ACPI BIOS since the
aforementioned commit touches only the PNP ACPI driver. I'm not
sure how to work around this, though. Ideas?
  

Complaining to Gigabyte seems to be the best approach.



I just happen to have a Windows Vista installation on this box as
well, and I just thought to check. Sorry, I wish I'd have thought of
it sooner but I don't go there often. You folks might be interested
to know that Windows appears to have the same silly problem with the
i/o resources (from Device Manager):

[00290 - 00294]  Motherboard resources
[00290 - 0029F]  Motherboard resources

I don't have anything that reads sensors in Windows though, so I
couldn't tell you if it could access that it87 chip or not.

So this pretty much confirms that it's a motherboard/bios issue.

Mike Houston

  


I'm seeing this exact problem on an Asus Nforce4 based board. Prior to 
moving to 2.6.24-rc4 it worked just fine.  No additional acpi options 
were selected in kernel config.  


So add Asus A8N-E to the list of broken pnpacpi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23

2007-12-09 Thread Robert Hancock

Andreas Mohr wrote:

On Mon, Dec 10, 2007 at 01:04:31AM +0100, Andreas Mohr wrote:

IOW, it seems very likely that _GTM on these BIOSes (VIA chipsets) isn't
actually wrongly implemented but simply expects IDE controller values
to have been set up ""differently"".


Or... one could possibly even infer from this that - maybe -
the _GTM invocation spot is wrong, it should be done somewhere
different during bootup. Or whatever.


"Whatever" indeed:

There's an ASL Match() for a "PMPT" (Primary Master PorT) PCI register,
and the possible register values are:

Package (0x04)
{
0x20,
0x31,
0x65,
0xA8
},

and from

OperationRegion (CFG2, PCI_Config, 0x40, 0x20)
Field (CFG2, DWordAcc, NoLock, Preserve)
{
Offset (0x08),·
SSPT,   8,·
SMPT,   8,·
PSPT,   8,·
PMPT,   8,·
Offset (0x10),·
...
we can infer that at PCI_Config offset 0x48 those values should be located.
However after bootup or resume there are:

# lspci -s 00:11.1 -xxx
00:11.1 IDE interface: VIA Technologies, Inc. 
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00: 06 11 71 05 07 00 90 02 06 8a 01 01 00 20 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 e4 00 00 00 00 00 00 00 00 00 00 06 11 71 05
30: 00 00 00 00 c0 00 00 00 00 00 00 00 ff 01 00 00
40: 0b 32 09 0a 18 1c c0 00 99 99 20 20 ff 00 a8 20
50: 07 07 f6 f1 14 03 00 00 a8 a8 a8 a8 00 00 00 00
60: 00 02 00 00 00 00 00 00 00 02 00 00 00 00 00 00
70: 02 01 00 00 00 00 00 00 82 01 00 00 00 00 00 00
80: 00 e0 a1 1f 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 06 00 71 05 06 11 71 05 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 07 00 00 00 00 00 00 00 00 00


As one can see, the relevant values for SSPT, SMPT, PSPT and PMPT are
99 99 20 20, which are not quite entirely valid judging from the array above,
and this is because the secondary port is unused, as can also be seen
from my bootup log:

scsi0 : pata_via
scsi1 : pata_via
ata1: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xe400 irq 14
ata2: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xe408 irq 15
ata1.00: ATA-5: WDC WD1200JB-00CRA1, 17.07W17, max UDMA/100
ata1.00: 234441648 sectors, multi 16: LBA
ata1.01: ATAPI: TOSHIBA DVD-ROM SD-M1612, 1004, max UDMA/33
Switched to high resolution mode on CPU 0
ata1.00: configured for UDMA/100
ata1.01: configured for UDMA/33
ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0) is 
beyond end of object [20070126]
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTM_] 
(Node df80b9a8), AE_AML_PACKAGE_LIM
IT
ACPI Error (psparse-0537): Method parse/execution failed 
[\_SB_.PCI0.IDE0.CHN1._GTM] (Node df80b8d0), AE_AML_PACKAG
E_LIMIT
ata2: ACPI get timing mode failed (AE 0x300d)


Manually tweaking the values to 20 20 20 20 truly does skip the _GTM failure 
message on suspend -
only to reappear right on resume due to 99 99 20 20 combo happening again.
If I don't tweak, I get _GTM failure at both suspend and resume.


As such one can conclude that this BIOS is rather very confused when being 
called for _GTM on an entirely
unused controller port. And this is either because the BIOS is dumb or because 
ACPI doesn't really
expect anyone to call _GTM on an unused physical port. I'd bet on the latter...
(however I haven't found ACPI 3.0b explicitly mentioning this somewhere yet)

Andreas Mohr



Probably Windows doesn't call _GTM on a port with no devices connected, 
and so the BIOS people never tested that case. Likely we can just avoid 
doing this - if no devices are connected the timing settings for that 
channel are irrelevant..


And you're quite right in your comment that we are often too quick to 
blacklist hardware instead of looking into why it really is failing. 
ACPI is one of those areas where we often just need to figure out how to 
be bug-to-bug compatibile with what Windows is doing..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 1/1] Convert the semaphore to a mutex in net/tipc/socket.c

2007-12-09 Thread Kevin Winchester
To: Andrew Morton <[EMAIL PROTECTED]>
Cc: Ingo Molnar <[EMAIL PROTECTED]>
Cc: Per Liden <[EMAIL PROTECTED]>
Cc: Jon Maloy <[EMAIL PROTECTED]>
Cc: Allan Stephens <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]
Cc: linux-kernel@vger.kernel.org

Note also that in the release method, down_interruptible() was being called
without checking the return value.  I converted it to mutex_lock_interruptible()
and made the interrupted case return -ERESTARTSYS, as was done for all other
calls to down_interruptible() in the file.

Signed-off-by: Kevin Winchester <[EMAIL PROTECTED]>

---
 net/tipc/socket.c |   56 +++---
 1 file changed, 28 insertions(+), 28 deletions(-)

Index: v2.6.24-rc4/net/tipc/socket.c
===
--- v2.6.24-rc4.orig/net/tipc/socket.c
+++ v2.6.24-rc4/net/tipc/socket.c
@@ -63,7 +63,7 @@
 struct tipc_sock {
struct sock sk;
struct tipc_port *p;
-   struct semaphore sem;
+   struct mutex lock;
 };
 
 #define tipc_sk(sk) ((struct tipc_sock*)sk)
@@ -217,7 +217,7 @@ static int tipc_create(struct net *net, 
tsock->p = port;
port->usr_handle = tsock;
 
-   init_MUTEX(>sem);
+   mutex_init(>lock);
 
dbg("sock_create: %x\n",tsock);
 
@@ -253,9 +253,10 @@ static int release(struct socket *sock)
dbg("sock_delete: %x\n",tsock);
if (!tsock)
return 0;
-   down_interruptible(>sem);
+   if (mutex_lock_interruptible(>lock))
+   return -ERESTARTSYS;
if (!sock->sk) {
-   up(>sem);
+   mutex_unlock(>lock);
return 0;
}
 
@@ -288,7 +289,7 @@ static int release(struct socket *sock)
atomic_dec(_queue_size);
}
 
-   up(>sem);
+   mutex_unlock(>lock);
 
sock_put(sk);
 
@@ -315,7 +316,7 @@ static int bind(struct socket *sock, str
struct sockaddr_tipc *addr = (struct sockaddr_tipc *)uaddr;
int res;
 
-   if (down_interruptible(>sem))
+   if (mutex_lock_interruptible(>lock))
return -ERESTARTSYS;
 
if (unlikely(!uaddr_len)) {
@@ -346,7 +347,7 @@ static int bind(struct socket *sock, str
res = tipc_withdraw(tsock->p->ref, -addr->scope,
>addr.nameseq);
 exit:
-   up(>sem);
+   mutex_unlock(>lock);
return res;
 }
 
@@ -367,7 +368,7 @@ static int get_name(struct socket *sock,
struct sockaddr_tipc *addr = (struct sockaddr_tipc *)uaddr;
u32 res;
 
-   if (down_interruptible(>sem))
+   if (mutex_lock_interruptible(>lock))
return -ERESTARTSYS;
 
*uaddr_len = sizeof(*addr);
@@ -380,7 +381,7 @@ static int get_name(struct socket *sock,
res = tipc_ownidentity(tsock->p->ref, >addr.id);
addr->addr.name.domain = 0;
 
-   up(>sem);
+   mutex_unlock(>lock);
return res;
 }
 
@@ -477,7 +478,7 @@ static int send_msg(struct kiocb *iocb, 
}
}
 
-   if (down_interruptible(>sem))
+   if (mutex_lock_interruptible(>lock))
return -ERESTARTSYS;
 
if (needs_conn) {
@@ -523,7 +524,7 @@ static int send_msg(struct kiocb *iocb, 
}
if (likely(res != -ELINKCONG)) {
 exit:
-   up(>sem);
+   mutex_unlock(>lock);
return res;
}
if (m->msg_flags & MSG_DONTWAIT) {
@@ -562,9 +563,8 @@ static int send_packet(struct kiocb *ioc
if (unlikely(dest))
return send_msg(iocb, sock, m, total_len);
 
-   if (down_interruptible(>sem)) {
+   if (mutex_lock_interruptible(>lock))
return -ERESTARTSYS;
-   }
 
do {
if (unlikely(sock->state != SS_CONNECTED)) {
@@ -578,7 +578,7 @@ static int send_packet(struct kiocb *ioc
res = tipc_send(tsock->p->ref, m->msg_iovlen, m->msg_iov);
if (likely(res != -ELINKCONG)) {
 exit:
-   up(>sem);
+   mutex_unlock(>lock);
return res;
}
if (m->msg_flags & MSG_DONTWAIT) {
@@ -846,7 +846,7 @@ static int recv_msg(struct kiocb *iocb, 
 
/* Look for a message in receive queue; wait if necessary */
 
-   if (unlikely(down_interruptible(>sem)))
+   if (unlikely(mutex_lock_interruptible(>lock)))
return -ERESTARTSYS;
 
 restart:
@@ -930,7 +930,7 @@ restart:
advance_queue(tsock);
}
 exit:
-   up(>sem);
+   mutex_unlock(>lock);
return res;
 }
 
@@ -981,7 +981,7 @@ static int recv_stream(struct kiocb *ioc
 
/* Look for a message in receive queue; wait if necessary */
 
-   if (unlikely(down_interruptible(>sem)))
+   if (unlikely(mutex_lock_interruptible(>lock)))
return -ERESTARTSYS;
 
 

Re: 2.6.24-rc4-mm1

2007-12-09 Thread Dave Young
On Dec 8, 2007 6:22 AM, Luis R. Rodriguez <[EMAIL PROTECTED]> wrote:
> On Dec 6, 2007 9:12 PM, Dave Young <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > 2.6.24-rc4-mm1 build failed at drivers/net/wireless/ath5k/base.c for some 
> > inline functions like this:
> > drivers/net/wireless/ath5k/base.c:292: sorry, unimplemented: inlining 
> > failed in call to 'ath5k_extend_tsf': function body not available
> >
> > fix it with adjust the order of inline function body.
> >
> > Signed-off-by: Dave Young <[EMAIL PROTECTED]>
>
> Acked-by: Luis R. Rodriguez <[EMAIL PROTECTED]>

Thanks.

>
> Thanks Dave. What version of gcc were you using? I haven't run into this.

gcc 3.4.6

>
> BTW, nothing new was added in this patch, things were just shifted,
> but even that may be copyrightable. Is it fair to assume you are
> licensing these changes under the same license the file is in?

Ok, I don't care.

>
> For this file we'd usually use:
>
> Changes-licensed-under: 3-clause-BSD
>
> For future reference:
>
> http://linuxwireless.org/en/developers/Documentation/SubmittingPatches#Changes-licensed-undertag
>
>   Luis
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: laptop reboots right after hibernation

2007-12-09 Thread Tejun Heo
Kjartan Maraas wrote:
>> Hmmm... Ah.. okay.  Wrongly splitted patch.  Can you please do it one
>> more time?
>>
> Attached.

Alright, it works now but it seems both dmesgs are from no-filter patch.
 I'm pretty sure it works too because one of your previous dmesgs showed
it worked.  Please double check.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RT Load balance changes in sched-devel

2007-12-09 Thread Steven Rostedt

Gregory Haskins wrote:

btw., both cases would be addressed by placing load-balance points
into sched_class_rt->{enqueue,dequeue}_task_rt()... push_rt_tasks()
and pull_rt_tasks() respectively. As a side effect (I think,
technically, it would be possible), 3 out of 4 *_balance_rt() calls
(the exception: schedule_tail_balance_rt()) in schedule() would become
unnecessary.

_BUT_

the enqueue/dequeue() interface would become less straightforward,
logically-wise.
Something like:


Also push and pull_rt use activate,deactivate as well. So this would 
make that code a bit more complex.




rq = activate_task(rq, ...) ; /* may unlock rq and lock/return another one 
*/


would complicate the existing use cases.



I think I would prefer to just fix the setscheduler/setprio cases for the class 
transition than change the behavior of these enqueue/dequeue calls.  But I will 
keep an open mind as I look into this issue.


I agree with Gregory on this. I prefer to fix the two you found. I 
thought about them before, but somehow they were missed :-/


Anyway, I'll be working on adding some more patches on Monday. There may 
be other ways to clean this up.




Thanks for the review!


Yeah, thanks from me too!

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


hwclock --systohc locks machine up, RTC conflict problems

2007-12-09 Thread kinesis

Issuing the command hwclock --systohc and sometimes at boot --hctosys causes
my machine to freeze.

The reason is there is a problem loading rtc-cmos. I will say that my
machine also runs Windows, and it detects the device fine as a Real Time
Clock/CMOS driver at io range 0070-0071. According to /proc/ioports rtc is
probed at 0070-0077.

During boot we get the following when attempting to modprobe rtc-cmos:


[EMAIL PROTECTED]:/usr/src/linux-2.6# dmesg|grep rtc

drivers/rtc/hctosys.c: unable to open rtc device (rtc0)

rtc_cmos 00:09: i/o registers already in use

rtc_cmos: probe of 00:09 failed with error -16

[EMAIL PROTECTED]:/usr/src/linux-2.6#



this has happened with the stock kernel that came with my distribution, as
well.

Apparently you can only use 0070-0071 for my rtc? how can I get the device
to work?

here is /proc/interrupts:

  CPU0   CPU1
  0:  23106   13272660   IO-APIC-edge  timer
  1: 49   5781   IO-APIC-edge  i8042
  8:  0  1   IO-APIC-edge  rtc
  9:159   9301   IO-APIC-fasteoi   acpi
 12:   1818 190257   IO-APIC-edge  i8042
 14:  0 35   IO-APIC-edge  ide0
 17:   2174 312677   IO-APIC-fasteoi   nvidia
 18:363  27432   IO-APIC-fasteoi   HDA Intel
 19:   1914 283347   IO-APIC-fasteoi   ndiswrapper
 20:   64341328206   IO-APIC-fasteoi   eth0
 21:  0  0   IO-APIC-fasteoi   ohci_hcd:usb2
 22:  0  0   IO-APIC-fasteoi   ehci_hcd:usb1
 23:209  19944   IO-APIC-fasteoi   sata_nv
NMI:  0  0   Non-maskable interrupts
LOC:   13272488  22989   Local timer interrupts
RES: 105913  91197   Rescheduling interrupts
CAL:151134   function call interrupts
TLB:  18445  19521   TLB shootdowns
TRM:  0  0   Thermal event interrupts
THR:  0  0   Threshold APIC interrupts
SPU:  0  0   Spurious interrupts
ERR:  0


-- 
View this message in context: 
http://www.nabble.com/hwclock---systohc-locks-machine-up%2C-RTC-conflict-problems-tp14245613p14245613.html
Sent from the linux-kernel mailing list archive at Nabble.com.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23

2007-12-09 Thread Andreas Mohr
On Mon, Dec 10, 2007 at 01:04:31AM +0100, Andreas Mohr wrote:
> IOW, it seems very likely that _GTM on these BIOSes (VIA chipsets) isn't
> actually wrongly implemented but simply expects IDE controller values
> to have been set up ""differently"".
> 
> 
> Or... one could possibly even infer from this that - maybe -
> the _GTM invocation spot is wrong, it should be done somewhere
> different during bootup. Or whatever.

"Whatever" indeed:

There's an ASL Match() for a "PMPT" (Primary Master PorT) PCI register,
and the possible register values are:

Package (0x04)
{
0x20,
0x31,
0x65,
0xA8
},

and from

OperationRegion (CFG2, PCI_Config, 0x40, 0x20)
Field (CFG2, DWordAcc, NoLock, Preserve)
{
Offset (0x08),·
SSPT,   8,·
SMPT,   8,·
PSPT,   8,·
PMPT,   8,·
Offset (0x10),·
...
we can infer that at PCI_Config offset 0x48 those values should be located.
However after bootup or resume there are:

# lspci -s 00:11.1 -xxx
00:11.1 IDE interface: VIA Technologies, Inc. 
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00: 06 11 71 05 07 00 90 02 06 8a 01 01 00 20 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 e4 00 00 00 00 00 00 00 00 00 00 06 11 71 05
30: 00 00 00 00 c0 00 00 00 00 00 00 00 ff 01 00 00
40: 0b 32 09 0a 18 1c c0 00 99 99 20 20 ff 00 a8 20
50: 07 07 f6 f1 14 03 00 00 a8 a8 a8 a8 00 00 00 00
60: 00 02 00 00 00 00 00 00 00 02 00 00 00 00 00 00
70: 02 01 00 00 00 00 00 00 82 01 00 00 00 00 00 00
80: 00 e0 a1 1f 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 06 00 71 05 06 11 71 05 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 07 00 00 00 00 00 00 00 00 00


As one can see, the relevant values for SSPT, SMPT, PSPT and PMPT are
99 99 20 20, which are not quite entirely valid judging from the array above,
and this is because the secondary port is unused, as can also be seen
from my bootup log:

scsi0 : pata_via
scsi1 : pata_via
ata1: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xe400 irq 14
ata2: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xe408 irq 15
ata1.00: ATA-5: WDC WD1200JB-00CRA1, 17.07W17, max UDMA/100
ata1.00: 234441648 sectors, multi 16: LBA
ata1.01: ATAPI: TOSHIBA DVD-ROM SD-M1612, 1004, max UDMA/33
Switched to high resolution mode on CPU 0
ata1.00: configured for UDMA/100
ata1.01: configured for UDMA/33
ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0) is 
beyond end of object [20070126]
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTM_] 
(Node df80b9a8), AE_AML_PACKAGE_LIM
IT
ACPI Error (psparse-0537): Method parse/execution failed 
[\_SB_.PCI0.IDE0.CHN1._GTM] (Node df80b8d0), AE_AML_PACKAG
E_LIMIT
ata2: ACPI get timing mode failed (AE 0x300d)


Manually tweaking the values to 20 20 20 20 truly does skip the _GTM failure 
message on suspend -
only to reappear right on resume due to 99 99 20 20 combo happening again.
If I don't tweak, I get _GTM failure at both suspend and resume.


As such one can conclude that this BIOS is rather very confused when being 
called for _GTM on an entirely
unused controller port. And this is either because the BIOS is dumb or because 
ACPI doesn't really
expect anyone to call _GTM on an unused physical port. I'd bet on the latter...
(however I haven't found ACPI 3.0b explicitly mentioning this somewhere yet)

Andreas Mohr
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch-early-RFC 00/10] LTTng architecture dependent instrumentation

2007-12-09 Thread Mathieu Desnoyers
* Mathieu Desnoyers ([EMAIL PROTECTED]) wrote:
> * Ingo Molnar ([EMAIL PROTECTED]) wrote:
> > 
> > hi Mathieu,
> > 
> > * Mathieu Desnoyers <[EMAIL PROTECTED]> wrote:
> > 
> > > Hi,
> > > 
> > > Here is the architecture dependent instrumentation for LTTng. [...]
> > 
> > A fundamental observation about markers, and i raised this point many 
> > many months ago already, so it might sound repetitive, but i'm unsure 
> > wether it's addressed. Documentation/markers.txt still says:
> > 
> > | * Purpose of markers
> > |
> > | A marker placed in code provides a hook to call a function (probe) 
> > | that you can provide at runtime. A marker can be "on" (a probe is 
> > | connected to it) or "off" (no probe is attached). When a marker is 
> > | "off" it has no effect, except for adding a tiny time penalty 
> > | (checking a condition for a branch) and space penalty (adding a few 
> > | bytes for the function call at the end of the instrumented function 
> > | and adds a data structure in a separate section).
> > 
> > could you please eliminate the checking of the flag, and insert a pure 
> > NOP sequence by default (no extra branches), which is then patched in 
> > with a function call instruction sequence, when the trace point is 
> > turned on? (on architectures that have code patching infrastructure - 
> > such as x86)
> > 
> 
> Hi Ingo,
> 
[...] 
> * No marker at all
> 
> 240300 cycles total
> 12.02 cycles per loop
> 
[...]
> * With my marker implementation (load immediate 0, branch predicted) :
> 
> between 200355 and 200580 cycles total (avg 200400 cycles)
> 10.02 cycles per loop (yes, adding the marker increases performance)
> 
[...]
> * With NOPs :
> 
> avg around 41 cycles total
> 20.5 cycles/loop (slowdown of 2)
> 
>
[...]
> Therefore, because of the cost of stack setup, the load immediate and
> conditionnal branch seems to be _much_ faster than the NOP alternative.
> 

I wanted to know what clever things the dtrace guys have done, so I just
dug into the dtrace code today, and it isn't pretty for x86.

For the kernel sdt (static dtrace), the closest match to markers, they :

1 - Use the linker to turn the calls to an undefined symbol into 
"0x90 0x90 0x90 0x90 0x90" (5 nops)
(note that they still suffer from the stack setup cost even when
disabled. Therefore, performance-wise, I think the markers are already
faster)

But let's dig deeper..

2 - When what they call a "provider" is actvated, the first byte of the
"instruction" (actually, it would be the second NOP) is changed for a f0
lock prefix) :

"0x90 0xf0 0x90 0x90 0x90"

3 - When this site is hit, the 0xf0 0x90 instruction will produce an
illegal op fault. In the handler, they emulate a trap by incrementing
EIP of the size of the illegal op. They lookup the faulty EIP in a hash
table to know which site caused it and then they call the dtrace_probe
function to call the consumers from there.

So, if I have not missed anything, they will have the performance cost
of a fault and a hash table lookup on the critical path, which is kind
of dumb. Just the fault adds a few thousand cycles (assuming it will
perform like an int3 breakpoint).

Compared to this, my approach of load immediate + branch when disabled
and the added function call when enabled are _much_ more lighweight.

I guess the dtrace approach is good enough on sparc (except for stack
setup cost when disabled), where they patch the 4 bytes nop into a 4
byte function call and manage to get good performance, but the hack they
are doing on x86 seems to be just too slow.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   >