date:20070503

Re: [ext3][kernels >= 2.6.20.7 at least] KDE going comatose when FS is under heavy write load (massive starvation)

2007-05-03 Thread Alex Tomas


Andrew Morton wrote:

On Fri, 04 May 2007 10:18:12 +0400 Alex Tomas <[EMAIL PROTECTED]> wrote:


Andrew Morton wrote:

Yes, there can be issues with needing to allocate journal space within the
context of a commit.  But

no-no, this isn't required. we only need to mark pages/blocks within
transaction, otherwise race is possible when we allocate blocks in transaction,
then transacton starts to commit, then we mark pages/blocks to be flushed
before commit.


I don't understand.  Can you please describe the race in more detail?


if I understood your idea right, then in data=ordered mode, commit thread writes
all dirty mapped blocks before real commit.

say, we have two threads: t1 is a thread doing flushing and t2 is a commit 
thread

t1  t2
find dirty inode I
find some dirty unallocated blocks
journal_start()
allocate blocks
attach them to I
journal_stop()

going to commit
find inode I dirty
do NOT find these blocks because they're
  allocated only, but pages/bhs aren't 
mapped
  to them
start commit


map pages/bhs to just allocate blocks


so, either we mark pages/bhs someway within journal_start()--journal_stop() or
commit thread should do lookup for all dirty pages. the latter doesn't sound 
nice, IMHO.

thanks, Alex



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/5] fallocate() implementation in i86, x86_64 and powerpc

2007-05-03 Thread Jakub Jelinek

On Thu, May 03, 2007 at 11:28:15PM -0700, Andrew Morton wrote:
> > > The posix spec implies that negative `len' is permitted - presumably 
> > > "allocate
> > > ahead of `offset'".  How peculiar.
> > 
> > I just checked the man page for posix_fallocate() and it says:
> > 
> >   EINVAL  offset or len was less than zero.

That describes the current glibc implementation.

> > We should probably follow this lead.
> 
> Yes, I think so.  I'm suspecting that
> http://www.opengroup.org/onlinepubs/009695399/functions/posix_fallocate.html
> is just buggy.  Or I can't read.
> 
> I mean, if we're going to support negative `len' then is the byte at
> `offset' inside or outside the segment?  Head spins.
> 
> However it would be neat if someone could test $OTHER_OS and, perhaps more
> importantly, the present glibc emulation (which I assume your manpage is
> referring to, so this would be a manpage test ;)).

int
posix_fallocate (int fd, __off_t offset, __off_t len)
{
  struct stat64 st;
  struct statfs f;

  /* `off_t' is a signed type.  Therefore we can determine whether
 OFFSET + LEN is too large if it is a negative value.  */
  if (offset < 0 || len < 0)
return EINVAL;
  if (offset + len < 0)
return EFBIG;

  /* First thing we have to make sure is that this is really a regular
 file.  */
  if (__fxstat64 (_STAT_VER, fd, &st) != 0)
return EBADF;
  if (S_ISFIFO (st.st_mode))
return ESPIPE;
  if (! S_ISREG (st.st_mode))
return ENODEV;

  if (len == 0)
{
  if (st.st_size < offset)
{
  int ret = __ftruncate (fd, offset);

  if (ret != 0)
ret = errno;
  return ret;
}
  return 0;
}
...

is what glibc does ATM.  Seems we violate the case where len == 0, as
EINVAL in that case is "shall fail".  But reading the standard to imply
negative len is ok is too much guessing, there is no word what it means
when len is negative and
"required storage for regular file data starting at offset and continuing for 
len bytes"
doesn't make sense for negative size.  
And given the general
"Implementations may support additional errors not included in this list,
may generate errors included in this list under circumstances other than
those described here, or may contain extensions or limitations that prevent
some errors from occurring."
I believe returning EINVAL for len < 0 is not a POSIX violation.
That doesn't mean the standard shouldn't be clarified, whether by saying
EINVAL must be returned for non-positive len or saying that using negative
len has undefined or implementation defined behavior.

> The above opengroup page only permits S_ISREG.  Preallocating directories
> sounds quite useful to me, although it's something which would be pretty
> hard to emulate if the FS doesn't support it.  And there's a decent case to
> be made for emulating it - run-anywhere reasons.  Does glibc emulation support
> directories?  Quite unlikely.

No, see above.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree

2007-05-03 Thread Bill Irwin

On Thu, May 03, 2007 at 06:38:21PM -0700, Paul Jackson wrote:
> Adding Christoph Lameter <[EMAIL PROTECTED]> to the cc list, as he knows
> more about hugetlb pages than I do.
> This patch strikes me as a bit odd.
> Granted, it's solving what could be a touchy problem with a fairly
> simple solution, which is usually a Good Thing(tm).
> However, the idea that different tasks would see different values for
> the following fields in /proc/meminfo:
>   HugePages_Total: 0
>   HugePages_Free:  0
> strikes me as odd, and risky.  I would have thought that usually, all
> tasks in the system should see the same values in the files in /proc
> (as opposed to the files in particular task subdirectories /proc/.)
> This patch strikes me as a bit of a hack, good for compatibility, but
> hiding a booby trap that will bite some user code in the long run.
> But I'm not enough of an expert to know what the right tradeoffs are
> in this matter.

The semantics of the global /proc/meminfo should not change; a separate
per-cpuset reporting mechanism should really be used.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Bluetooth: postpone hci_dev unregistration

2007-05-03 Thread Marcel Holtmann

Hi Jiri,

> (I sent this a week ago but it seems to have got lost in other noise, 
> resending)
> 
> From: Jiri Kosina <[EMAIL PROTECTED]>
> 
> Bluetooth: postpone hci_dev unregistration
> 
> Commit b40df57 substituted bh_lock_sock() in hci_sock_dev_event() for 
> lock_sock() when unregistering HCI device, in order to prevent deadlock 
> against locking in l2cap_connect_cfm() from softirq context.
> 
> This however introduces another problem - hci_sock_dev_event() for 
> HCI_DEV_UNREG can also be triggered in atomic context, in which calling 
> lock_sock() is not safe as it could sleep. Reported by Jeremy Fitzhardinge 
> at http://lkml.org/lkml/2007/4/23/271
> 
> This patch moves the detaching of sockets from hci_device into workqueue, 
> so that lock_sock() can be used safely. This requires movement of 
> deallocation of hci_dev - deallocating device just after 
> hci_unregister_dev() would be too soon, as it could happen before the 
> workqueue has been run.

I saw the report on LKML, but I am not really comfortable with this
approach. It feels like an ugly hack. This needs more thinking and I
think that simplifying the looking between HCI and L2CAP should be the
goal.

Regards

Marcel


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ext3][kernels >= 2.6.20.7 at least] KDE going comatose when FS is under heavy write load (massive starvation)

2007-05-03 Thread Andrew Morton

On Fri, 04 May 2007 10:18:12 +0400 Alex Tomas <[EMAIL PROTECTED]> wrote:

> Andrew Morton wrote:
> > Yes, there can be issues with needing to allocate journal space within the
> > context of a commit.  But
> 
> no-no, this isn't required. we only need to mark pages/blocks within
> transaction, otherwise race is possible when we allocate blocks in 
> transaction,
> then transacton starts to commit, then we mark pages/blocks to be flushed
> before commit.

I don't understand.  Can you please describe the race in more detail?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Remove constructor from buffer_head

2007-05-03 Thread William Lee Irwin III

On Thu, May 03, 2007 at 08:08:41PM -0700, Christoph Lameter wrote:
> Performance tests show a slight improvements in netperf (not a
> strong case for a performance improvement but removing the
> constructor has definitely no negative impact so why keep
> this around?).

Cache effects are not so easily visible. Cache profile results from
more realistic workloads (e.g. major macrobenchmarks) are more
appropriate for gauging this.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/5] fallocate() implementation in i86, x86_64 and powerpc

2007-05-03 Thread Andrew Morton

On Fri, 4 May 2007 16:07:31 +1000 David Chinner <[EMAIL PROTECTED]> wrote:

> On Thu, May 03, 2007 at 09:29:55PM -0700, Andrew Morton wrote:
> > On Thu, 26 Apr 2007 23:33:32 +0530 "Amit K. Arora" <[EMAIL PROTECTED]> 
> > wrote:
> > 
> > > This patch implements the fallocate() system call and adds support for
> > > i386, x86_64 and powerpc.
> > > 
> > > ...
> > > +{
> > > + struct file *file;
> > > + struct inode *inode;
> > > + long ret = -EINVAL;
> > > +
> > > + if (len == 0 || offset < 0)
> > > + goto out;
> > 
> > The posix spec implies that negative `len' is permitted - presumably 
> > "allocate
> > ahead of `offset'".  How peculiar.
> 
> I just checked the man page for posix_fallocate() and it says:
> 
>   EINVAL  offset or len was less than zero.
> 
> We should probably follow this lead.

Yes, I think so.  I'm suspecting that
http://www.opengroup.org/onlinepubs/009695399/functions/posix_fallocate.html
is just buggy.  Or I can't read.

I mean, if we're going to support negative `len' then is the byte at
`offset' inside or outside the segment?  Head spins.

However it would be neat if someone could test $OTHER_OS and, perhaps more
importantly, the present glibc emulation (which I assume your manpage is
referring to, so this would be a manpage test ;)).

> > > +
> > > + ret = -ENODEV;
> > > + if (!S_ISREG(inode->i_mode))
> > > + goto out_fput;
> > 
> > So we return ENODEV against an S_ISBLK fd, as per the posix spec.  That
> > seems a bit silly of them.
> 
> H - I thought that the intention of sys_fallocate() was to
> be generic enough to eventually allow preallocation on directories.
> If that is the case, then this check will prevent that

The above opengroup page only permits S_ISREG.  Preallocating directories
sounds quite useful to me, although it's something which would be pretty
hard to emulate if the FS doesn't support it.  And there's a decent case to
be made for emulating it - run-anywhere reasons.  Does glibc emulation support
directories?  Quite unlikely.

But yes, sounds like a desirable thing.  Would XFS support it easily if the 
above
check was relaxed?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Rewrite the MAJOR() macro as a call to imajor().

2007-05-03 Thread Andrew Morton

On Sat, 28 Apr 2007 06:23:54 -0400 (EDT) "Robert P. J. Day" <[EMAIL PROTECTED]> 
wrote:

> Replace the MAJOR() macro invocation with a call to the inline
> imajor() routine.
> 
> Signed-off-by: Robert P. J. Day <[EMAIL PROTECTED]>
> 
> ---
> 
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 6b5b642..08da15b 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -710,7 +710,7 @@ static inline int is_loop_device(struct file *file)
>  {
>   struct inode *i = file->f_mapping->host;
> 
> - return i && S_ISBLK(i->i_mode) && MAJOR(i->i_rdev) == LOOP_MAJOR;
> + return i && S_ISBLK(i->i_mode) && imajor(i) == LOOP_MAJOR;
>  }

there's no runtime change, and I count a couple hundred MAJORs in the tree.

I don't want to receive 200 one-line patches please.  If you're going to
do this then please do decent-sized per-subsystem patches and see if you can
persuade the subsystem maintainers to take them directly.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ext3][kernels >= 2.6.20.7 at least] KDE going comatose when FS is under heavy write load (massive starvation)

2007-05-03 Thread Alex Tomas


Andrew Morton wrote:

Yes, there can be issues with needing to allocate journal space within the
context of a commit.  But


no-no, this isn't required. we only need to mark pages/blocks within
transaction, otherwise race is possible when we allocate blocks in transaction,
then transacton starts to commit, then we mark pages/blocks to be flushed
before commit.


a) If the page has newly allocated space on disk then the metadata which
   refers to that page is already in the journal: no new journal space
   needed.

b) If the page doesn't have space allocated on disk then we don't need
   to write it out at ordered-mode commit time, because the post-recovery
   filesystem will not have any references to that page.

c) If the page is dirty due to overwrite then no metadata update was required.

IOW, under what circumstances would an ordered-mode commit need to allocate
space for a delayed-allocate page?


no need to allocate space within commit thread, I think. only to take care
of the race I described above. in hackish version of data=ordered for delayed
allocation I used counter of submitted bio's with newly-allocated blocks and
commit thread waits for the counter to reach 0.



However b) might lead to the hey-my-file-is-full-of-zeroes problem.



thanks, Alex

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [-mm Patch]nbd: check the return value of sysfs_create_file

2007-05-03 Thread Andrew Morton

On Sat, 28 Apr 2007 13:30:23 +0800 WANG Cong <[EMAIL PROTECTED]> wrote:

> Since 'sysfs_create_file' is declared with attribute warn_unused_result, we 
> must always check its return value carefully.
> 

Well that's not really the reason for your patch.

warn_unused_result is there to tell us that there are deeper problems in
the code which need addressing: the failure to check the
sysfs_create_file() return value means that bugs in the kernel can remain
undetected, or can be harder to find.

> 
> ---
> 
> --- linux-2.6.21-rc7-mm2/drivers/block/nbd.c.orig 2007-04-27 
> 17:27:47.0 +0800
> +++ linux-2.6.21-rc7-mm2/drivers/block/nbd.c  2007-04-27 17:47:32.0 
> +0800
> @@ -373,7 +373,10 @@ static void nbd_do_it(struct nbd_device 
>   BUG_ON(lo->magic != LO_MAGIC);
>  
>   lo->pid = current->pid;
> - sysfs_create_file(&lo->disk->kobj, &pid_attr.attr);
> + if (sysfs_create_file(&lo->disk->kobj, &pid_attr.attr)) {
> + printk(KERN_ERR "nbd: sysfs_create_file failed!");
> + return;
> + }
>  
>   while ((req = nbd_read_stat(lo)) != NULL)
>   nbd_end_request(req);

It would better saner to propagate this error back through callers:

--- a/drivers/block/nbd.c~nbd-check-the-return-value-of-sysfs_create_file-fix
+++ a/drivers/block/nbd.c
@@ -366,23 +366,25 @@ static struct disk_attribute pid_attr = 
.show = pid_show,
 };
 
-static void nbd_do_it(struct nbd_device *lo)
+static int nbd_do_it(struct nbd_device *lo)
 {
struct request *req;
+   int ret;
 
BUG_ON(lo->magic != LO_MAGIC);
 
lo->pid = current->pid;
-   if (sysfs_create_file(&lo->disk->kobj, &pid_attr.attr)) {
+   ret = sysfs_create_file(&lo->disk->kobj, &pid_attr.attr);
+   if (ret) {
printk(KERN_ERR "nbd: sysfs_create_file failed!");
-   return;
+   return ret;
}
 
while ((req = nbd_read_stat(lo)) != NULL)
nbd_end_request(req);
 
sysfs_remove_file(&lo->disk->kobj, &pid_attr.attr);
-   return;
+   return 0;
 }
 
 static void nbd_clear_que(struct nbd_device *lo)
@@ -572,7 +574,9 @@ static int nbd_ioctl(struct inode *inode
case NBD_DO_IT:
if (!lo->file)
return -EINVAL;
-   nbd_do_it(lo);
+   error = nbd_do_it(lo);
+   if (error)
+   return error;
/* on return tidy up in case we have a signal */
/* Forcibly shutdown the socket causing all listeners
 * to error
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree

2007-05-03 Thread Ken Chen


On 5/3/07, Paul Jackson <[EMAIL PROTECTED]> wrote:

Note, Ken, that if we did that, the calculation of these new Total and
Free stats would be a little different than your new code.  Instead of
looping over the memory nodes in the current tasks mems_allowed mask,
we would loop over the memory nodes allowed in the cpuset being queried
(the cpuset whose 'hugepages_total' or 'hugepages_free' special
file we were reading, not the current tasks cpuset.)


This is even more controversial and messy.  akpm already dropped the
patch and expressed that he doesn't like it.  And I won't go down
another messy path. I will let this idea RIP.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] revoke: change revoke_table to fileset and revoke_details

2007-05-03 Thread Pekka J Enberg

On Thu, 3 May 2007, Andrew Morton wrote:
> Well that's the "locking" protocol then: each instance of this structure is
> only ever touched by a single thread, yes?

Yes. Each do_revoke() call creates a new instance.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree

2007-05-03 Thread Paul Jackson

David wrote:
> This information is already exported to userspace through sysfs.  Simply 
> grab the N-mems allowed to your task from /proc/pid/status, cat 
> /sys/devices/system/node/nodeN/meminfo for each N, and add.

Good point.

I don't see how this present patch, to change /proc/meminfo,
can be justified, given this.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Correct location for ADC/DAC drivers

2007-05-03 Thread Stefan Roese

On Wednesday 02 May 2007 21:11, Russell King wrote:
> > > Is there a maintainer for this "drivers/mfd" directory?
> >
> > rmk
>
> I wouldn't go that far.  There's no real infrastructure there
> to maintain, so I'd actually say that the directory was
> maintainerless.  However, I'll own up to the UCB/MCP drivers
> in there.

So perhaps you could answer is you feel that these ADC & DAC chrdev device 
drivers would fit into this drivers/mfd directory, or are better suited for 
the drivers/char directory?

Thanks.

Best regards,
Stefan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/5] fallocate() implementation in i86, x86_64 and powerpc

2007-05-03 Thread David Chinner

On Thu, May 03, 2007 at 09:29:55PM -0700, Andrew Morton wrote:
> On Thu, 26 Apr 2007 23:33:32 +0530 "Amit K. Arora" <[EMAIL PROTECTED]> wrote:
> 
> > This patch implements the fallocate() system call and adds support for
> > i386, x86_64 and powerpc.
> > 
> > ...
> > +{
> > +   struct file *file;
> > +   struct inode *inode;
> > +   long ret = -EINVAL;
> > +
> > +   if (len == 0 || offset < 0)
> > +   goto out;
> 
> The posix spec implies that negative `len' is permitted - presumably "allocate
> ahead of `offset'".  How peculiar.

I just checked the man page for posix_fallocate() and it says:

  EINVAL  offset or len was less than zero.

We should probably follow this lead.

> > +
> > +   ret = -ENODEV;
> > +   if (!S_ISREG(inode->i_mode))
> > +   goto out_fput;
> 
> So we return ENODEV against an S_ISBLK fd, as per the posix spec.  That
> seems a bit silly of them.

H - I thought that the intention of sys_fallocate() was to
be generic enough to eventually allow preallocation on directories.
If that is the case, then this check will prevent that

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull] New firewire stack

2007-05-03 Thread Bill Fink

On Thu, 03 May 2007, Kristian Høgsberg wrote:

> Adrian Bunk wrote:
> >> | An advantage of changing the names is that they are now prefixed.
> >>
> >> Is the opportunity to clean up module names compelling enough, vs. (the
> >> wish for) minimized trouble with scripts which refer to module names?
> >> ...
> > 
> > How big is the trouble actually?
> 
> Exactly.  In Fedora we've just added a fw-sbp2 case to mkinitrd, it's only a 
> couple of lines of extra shell code:
> 
>  elif [ "$modName" = "fw-sbp2" ]; then
>  findmodule fw-core
>  findmodule fw-ohci
>  modName="fw-sbp2"
> 
> and that's the extent of the changes.  The sbp2 case for the old drivers is 
> still in there and in the end mkinitrd works with either stack.
> 
> Kristian

I also think both stacks should be provided in the mainline kernel,
preferably in their own separate directories.  I still need the old
stack for dv1394, which isn't available in the new stack.  But if
the new stack is also there, I might be motivated for example to try
out the new sbp2 module, to see how well it works and how it compares
in performance to the old sbp2 module.  If it's not there, I'm probably
not going to go out of my way to download it from the net, since my
existing setup is working just fine for me.

-Bill
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] natsemi: Improve DspCfg workaround

2007-05-03 Thread Rafał Bilski

> The natsemi driver contains a workaround for broken hardware which can
> on some boards cause more problems than it solves.  The following patch
> series improves this by making the diagnostic more obvious and allowing
> users to disable the workaround if it causes them problems.

Works great. Thank You all for help.

Thanks
Rafał




--
NIE KUPUJ!!!
...zanim nie porownasz cen >> http://link.interia.pl/f1a5e



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree

2007-05-03 Thread Paul Jackson

Andrew wrote:
> If it's per-cpuset information then shouldn't it be presented in
> /dev/cpuset/something?

Yeah - if huge pages were mainline future, rather than the more
controversial sideline they are now, then it would make more sense
to put in these stats in each cpuset.

Note, Ken, that if we did that, the calculation of these new Total and
Free stats would be a little different than your new code.  Instead of
looping over the memory nodes in the current tasks mems_allowed mask,
we would loop over the memory nodes allowed in the cpuset being queried
(the cpuset whose 'hugepages_total' or 'hugepages_free' special
file we were reading, not the current tasks cpuset.)

But I'm reluctant to entertain such cpuset additions until I see more
of where my colleague Christoph is going in related work.

Clearly as can be seen on one of his posts on the parallel lkml thread:

  Re: + pretend-cpuset-has-some-form-of-hugetlb-page-reservation.patch added to 
-mm tree

earlier today, Christoph is no great fan of the current implementation
of huge pages.

And clearly as memory continues to get bigger, we will be putting more
stress on these page size related mechanisms.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Detecting process death for anycast named process monitoring

2007-05-03 Thread Russell King

On Wed, May 02, 2007 at 06:12:27PM -0500, David M. Lloyd wrote:
> On Wed, 2007-05-02 at 16:30 -0600, Chris Friesen wrote:
> > Glen Turner wrote:
> > 
> > > The question is, how can a process with no relationship to another
> > > process detect that process unexpectedly dying?  If named goes
> > > away to a better place, we want to shut down the interface
> > > which causes Quagga to inject the anycast route.
> 
> > We did something similar where arbitrary processes can register to be 
> > sent an arbitrary signal when the state of other processes change.
> 
> What about something like inotify, but for processes?  That would be
> cool...

Or maybe just ignoring the SIGHUP before exec'ing the named process as
a child.

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/5] Power Management: use mutexes instead of semaphores

2007-05-03 Thread Andrew Morton

On Fri, 27 Apr 2007 10:43:22 +0200 Matthias Kaehlcke <[EMAIL PROTECTED]> wrote:

> the Power Management code uses semaphores as mutexes. use the mutex
> API instead of the (binary) semaphores

I know it's a little thing, but given a choice between

a) changelogs which use capital letters and fullstops and

b) changelogs which do not,

I think a) gives a better result.

I note that none of these patches added a #include .  Each C
file which uses mutexes should do that, rather than relying upon accidental
nested includes.  I hope you're checking for that.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] change global zonelist order v4 [0/2]

2007-05-03 Thread Andrew Morton

On Fri, 27 Apr 2007 14:45:30 +0900 KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> wrote:

> Hi, this is version 4. including Lee Schermerhon's good rework.
> and automatic configuration at boot time.

hm, this adds rather a lot of code.  Have we established that it's worth
it?

And it's complex - how do poor users know what to do with this new control?

This:

+ * = "[dD]efault | "0" - default, automatic configuration.
+ * = "[nN]ode"|"1" - order by node locality,
+ *   then zone within node.
+ * = "[zZ]one"|"2" - order by zone, then by locality within zone

seems a bit excessive.  I think just the 0/1/2 plus documentation would
suffice?

I haven't followed this discussion very closely I'm afraid.  If we came up
with a good reason why Linux needs this feature then could someone please
(re)describe it?

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel/relay.c: a strange usage of delayed_work

2007-05-03 Thread Tom Zanussi

On Fri, 2007-05-04 at 01:38 +0400, Oleg Nesterov wrote:
> relay_switch_subbuf() does schedule_delayed_work(&buf->wake_readers, 1),
> wakeup_readers() only does wake_up_interruptible() and nothing more.
> 
> Why can't we use a plain timer for this?
> 
> In any case, this "wake_up ->read_wait after a minimal possible delay"
> looks somewhat strange to me, could you explain? just curious.
> 

The reason it's done that way is that if the event that causes the
relay_switch_subbuf() happens to be an event logged from schedule(), and
we directly call wake_up_interruptible() at that point, we lock up the
machine because it ends up back in schedule().  Deferring it avoids the
problem.

I don't see any problem with using a plain timer instead - I'll work up
a patch to make that change.

Tom

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2/6] add config option to vmalloc stacks (was: Re: [-mm patch] i386: enable 4k stacks by default)

2007-05-03 Thread Joseph Fannin

On Mon, Apr 30, 2007 at 10:43:10AM -0700, William Lee Irwin III wrote:

> +   Allocates the stack physically discontiguously and from high
> +   memory. Furthermore an unmapped guard page follows the stack.
> +   This is not for end-users. It's intended to trigger fatal
> +   system errors under various forms of stack abuse.

Why is this not for end-users?  Will it not trigger anything
useful unless set up properly, or is a big performace hit -- and how,
or what?

All the kernel debug options are underdocumented this way -- I'd
like to have as many of them on as I can without absolutely killing
performance, (or rather, *you* would) -- but I can never tell without
grovelling all over for the info, which... well, I haven't done it
yet, anyway.

"End-user" is just insufficently defined for anyone compiling
their own kernel.  Could you add a bit more text here describing what
the effect of physically discontiguous high-memory stacks is?  An
additional frobnitz dereference on every badda-bing badda-bang, likely
to double the time it takes to dance the hokey pokey?

   *shrug*  Some of those debug options probably don't get set very
often on kernels that are run for more than to see if it boots.

--
Joseph Fannin
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree

2007-05-03 Thread David Rientjes

On Thu, 3 May 2007, Paul Jackson wrote:

>  2) adding two new values, by such names as:
> 
>   Current_Cpuset_HugePages_Total:0
>   Current_Cpuset_HugePages_Free: 0
> 

This information is already exported to userspace through sysfs.  Simply 
grab the N-mems allowed to your task from /proc/pid/status, cat 
/sys/devices/system/node/nodeN/meminfo for each N, and add.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Routing 600+ vlan's via linux problems (looks like arp problems)

2007-05-03 Thread Willy Tarreau

On Fri, May 04, 2007 at 05:48:18AM +0200, Øyvind Vågen Jægtnes wrote:
> Hi again :)
> 
> On 5/4/07, Willy Tarreau <[EMAIL PROTECTED]> wrote:
> >On Thu, May 03, 2007 at 11:12:09PM +0200, Øyvind Vågen Jægtnes wrote:
> >> On 5/3/07, Jan Engelhardt <[EMAIL PROTECTED]> wrote:
> >> >
> >> >On May 3 2007 22:53, Willy Tarreau wrote:
> >> >>> For the rest all we see in the arp cache is (incomplete)
> >> >>
> >> >>I suspect that your arp cache is full (128 entries by default).
> >> >>Check /proc/sys/net/ipv4/neigh/gc_thresh1 (128 for me). You can
> >> >>set it as high as gc_thresh2 (512 for me), and I don't know what
> >> >>happens above.
> >> >
> >> >Above, you will perhaps need the not-so-elegant userspace arpd :-/
> >>
> >> Yes, i was suspecting that the arp cache got full, but i will try
> >> increasing it :)
> >> Would there be any huge bugs if i change these lines in arp.c:
> >>
> >>.gc_thresh1 =   128,
> >>.gc_thresh2 =   512,
> >>
> >> to
> >>
> >>.gc_thresh1 =   700,
> >>.gc_thresh2 =   700,
> >>
> >> under the definition for struct arp_tbl?
> >
> >I don't think it could cause a problem, but network people will surely
> >correct me if I'm wrong.
> 
> System is up and running perfectly now, it is routing everything at
> about 200 mbps now with only 5% load avg with the above changes to
> arp.c
> 
> So the real question now is, why is this number so low by default?
> It would probably be much better if this could be handled dynamically
> in the kernel.

I remember I read an argument against this a long time ago, but I
don't remember where. I think it was some arbitrary decision that
people using more than X ARP entries will need arpd. Most probably
the code path in the ARP updates is/was not much optimized to handle
large number of entries. Think about cable operators who may have
10-2 entries !

> Its a Juniper M7i
> It comes default with a 5400 rpm laptop 2.5" harddrive but now we
> bought a more robust "server" 2.5" harddrive.

The "server" ones are not necessarily more robust, often they are faster.

> It still barfs on the OS
> install, so the linux is doing all the job now. Will get a juniper guy
> to come and fix :)
> 
> As a side note, i'm starting to wonder if it was worth the $20k when i
> could just have a linux machine to do the job with a clone for backup
> ;)

That's often how linux penetrates the enterprise ;-)

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/5] fallocate() implementation in i86, x86_64 and powerpc

2007-05-03 Thread Paul Mackerras

Andrew Morton writes:

> On Thu, 26 Apr 2007 23:33:32 +0530 "Amit K. Arora" <[EMAIL PROTECTED]> wrote:
> 
> > This patch implements the fallocate() system call and adds support for
> > i386, x86_64 and powerpc.
> > 
> > ...
> >
> > +asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len)
> 
> Please add a comment over this function which specifies its behaviour. 
> Really it should be enough material from which a full manpage can be
> written.

This looks like it will have the same problem on s390 as
sys_sync_file_range.  Maybe the prototype should be:

asmlinkage long sys_fallocate(loff_t offset, loff_t len, int fd, int mode)

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-dvb] Re: DST/BT878 module customization (.. was: Critical points about ...)

2007-05-03 Thread Trent Piepho

On Thu, 3 May 2007, Mauro Carvalho Chehab wrote:
> Em Qua, 2007-05-02 ??s 04:10 -0700, Trent Piepho escreveu:
> > I promise, this time it's right!
> > http://linuxtv.org/hg/~tap/dst-new
>
> Confirmed. Now the patch is properly working. My tests were done with a
> board with DST. Those are the results:
>
> 1) when DST is unselected, on a board with DST, it will print the errors
> indicating that the Kconfig items were not selected:
>
> DVB: registering new adapter (bttv0).
> DVB: Unable to find symbol dst_attach()
> frontend_init: Could not find a Twinhan DST.
> dvb-bt8xx: A frontend driver was not found for device 109e/0878 subsystem 
> fbfb/f800
>
> The only issue is the wrong printk msg, stating that a "frontend driver"
> were not found. As this issue also happens with the current driver, due
> the usage of dvb_attach() macro, I don't see any regressions.
>
> It would be nice, however, to have a patch making dvb_attach more
> generic, by e.g. having a variant that allows passing another message.

Only this message is from dvb_attach():
> DVB: Unable to find symbol dst_attach()

Is it saying that it cannot load the module that dst_attach() is in (it
doesn't know what module that is, modprobe knows that).  If you enabled dst
support and deleted the module, it would be the same.

If you turn off dvb_attach() and also disable dst, you should instead get
this message:
dst_attach: driver disabled by Kconfig

Maybe that would look nicer with a "DVB:  " prefix?  That would easier if it
wasn't necessary to update the printk in each boilerplate stub function.  What
if one macro created these stubs

> frontend_init: Could not find a Twinhan DST.
> dvb-bt8xx: A frontend driver was not found for device 109e/0878 subsystem 
> fbfb/f800

These two messages are printed by the dvb-bt8xx driver, not by dvb_attach().
It would be trivial to change of course, but I'm not sure what would be
pedantically correct for both dst and non-dst based hardware.

> There's an argument against the prototype changes on dst_attach and
> dst_ca_attach since they aren't frontend.

The reason I changed that, is the dst_attach() already did return a
dvb_frontend pointer, it was just inside an enclosing structure.  i.e. what
existed before:

{
struct dst_state *state;
state = dst_attach(...);
card->fe = &state->frontend;
} /* state goes out of scope */

The frontend is inside the state struct and the state pointer isn't saved
anywhere.  dvb-bt8xx just saves a frontend pointer from inside the dst state
and tosses the state pointer away.  So I changed that to:

card->fe = dst_attach(...);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree

2007-05-03 Thread Paul Jackson

Ken wrote:
> If this is odd, do you have any suggestions for alternative?

No, I don't.  Sorry.

It's a touchy problem, and I'm not enough of an expert to know what the
right tradeoffs are in this matter.

I agree with your point that if you realize what's going on, namely
that what cpuset the task reading meminfo is in affects the HugePages
values that are read, then one can use the interface easily enough.

... how about:

 1) don't change the existing HugePages_* values - keep them
system-wide, and

 2) adding two new values, by such names as:

Current_Cpuset_HugePages_Total:0
Current_Cpuset_HugePages_Free: 0

That's certainly an uglier proposal than yours ;).  But at least it
seems clearer, and doesn't make incompatible changes to what's there.

It does require user level code change to actually benefit from the
new values, whereas your patch sort of sneaks them in, on the assumption
that the majority of reads of these values would really prefer getting
the cpuset relative totals instead.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree

2007-05-03 Thread Andrew Morton

On Thu, 3 May 2007 21:49:12 -0700 "Ken Chen" <[EMAIL PROTECTED]> wrote:

> On 5/3/07, Paul Jackson <[EMAIL PROTECTED]> wrote:
> > Adding Christoph Lameter <[EMAIL PROTECTED]> to the cc list, as he knows
> > more about hugetlb pages than I do.
> >
> > This patch strikes me as a bit odd.
> >
> > Granted, it's solving what could be a touchy problem with a fairly
> > simple solution, which is usually a Good Thing(tm).
> >
> > However, the idea that different tasks would see different values for
> > the following fields in /proc/meminfo:
> >
> > HugePages_Total: 0
> > HugePages_Free:  0
> >
> > strikes me as odd, and risky.  I would have thought that usually, all
> > tasks in the system should see the same values in the files in /proc
> > (as opposed to the files in particular task subdirectories /proc/.)
> >
> > This patch strikes me as a bit of a hack, good for compatibility, but
> > hiding a booby trap that will bite some user code in the long run.
> >
> > But I'm not enough of an expert to know what the right tradeoffs are
> > in this matter.
> 
> Would annotating the Hugepages_* field with name of cpuset help?

There are existing programs which parse /proc/meminfo.  If we're going to
do any of this then it would need to be via new fields.

I don't think we should be altering the meaning of the HugePages fields
like this.  One can imagine scenarios in which such a change would cause
existing userspace scripts to fail.  Plus it's Just Weird to use
/proc/meminfo in this manner.

>  I
> orginally thought that since cpuset's mems are hirearchical in memory
> assignment, it is fairly straightforward to understand what's going
> on: parent cpuset stats include its and all of its children.  For
> example, if root cpuset has two sub job1 and job2 cpusets, each has 20
> and 30 htlb pages, when query at each level, we have:
> 
> [EMAIL PROTECTED] echo $$ > /dev/cpuset/tasks
> [EMAIL PROTECTED] grep HugePages_Total /proc/meminfo
> HugePages_Total:50
> 
> [EMAIL PROTECTED] echo $$ > /dev/cpuset/job1/tasks
> [EMAIL PROTECTED] grep HugePages_Total /proc/meminfo
> HugePages_Total:20
> 
> [EMAIL PROTECTED] echo $$ > /dev/cpuset/job2/tasks
> [EMAIL PROTECTED] grep HugePages_Total /proc/meminfo
> HugePages_Total:30
> 
> If this is odd, do you have any suggestions for alternative?

If it's per-cpuset information then shouldn't it be presented in
/dev/cpuset/something?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: console font limits

2007-05-03 Thread Antonino A. Daplas

On Thu, 2007-05-03 at 23:58 -0400, Daniel Hazelton wrote:
> On Thursday 03 May 2007 20:39:05 H. Peter Anvin wrote:
> > Kyle Moffett wrote:

> I guess I could start on that work again - shouldn't take me all that long to 
> recover the stuff I lost when a blackout caused my hard drive to get 
> corrupted beyond recovery (and the automated journal replay didn't do a 
> damned thing - I think it actually *added* to the corruption, but I don't 
> think any filesystem would have survived that)

You might want to look at the modesetting-101 branch of DRM.  It's goal
is similar to yours.  They even have a drm framebuffer.  I don't know
how far they are with their goal, but I can see some progress.

Here's their git tree:

git://git.freedesktop.org/git/mesa/drm#modesetting-101



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RHEL 3

2007-05-03 Thread Arjan van de Ven

On Fri, 2007-05-04 at 12:27 +0800, Majumder, Rajib wrote:
> Hi,

you're offtopic and are better off asking on a RH list

> 
> I am wondering if RHEL 3 (based on 2.4.21 kernel but RH claims they 
> backported lot of 2.6 kernel's feature into it) supports Multi-Core and 
> Hyperthreaded CPUs. 

it'll boot. it'll not work well.
> 
> Is the CPU-scheduler multi-core/

no

> hyperthreading aware? 

yes

> Is it aware ccNUMA

no



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/5] fallocate() implementation in i86, x86_64 and powerpc

2007-05-03 Thread Andrew Morton

On Thu, 3 May 2007 21:29:55 -0700 Andrew Morton <[EMAIL PROTECTED]> wrote:

> > +   ret = -EFBIG;
> > +   if (offset + len > inode->i_sb->s_maxbytes)
> > +   goto out_fput;
> 
> This code does handle offset+len going negative, but only by accident, I
> suspect.

But it doesn't handle offset+len wrapping through zero.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/8] Universal power supply class (was: battery class)

2007-05-03 Thread Shem Multinymous


On 5/3/07, Anton Vorontsov <[EMAIL PROTECTED]> wrote:

This class is result of "external power" and "battery" classes merge,
as suggested by David Woodhouse. He also implemented uevent support.


Looks great. In particular, the policies you've chosen for the
attributes and units are very reasonable.

I'll gladly accept patches moving tp_smapi to this interface (or
eventually do it myself when I have time).


A few minor points:


+#define POWER_SUPPLY_TECHNOLOGY_UNKNOWN 0
+#define POWER_SUPPLY_TECHNOLOGY_NIMH1
+#define POWER_SUPPLY_TECHNOLOGY_LION2
+#define POWER_SUPPLY_TECHNOLOGY_LIPO3


Might as well add NiCd (common in UPS).



+#define POWER_SUPPLY_CAPACITY_LEVEL_UNKNOWN  0
+#define POWER_SUPPLY_CAPACITY_LEVEL_CRITICAL 1
+#define POWER_SUPPLY_CAPACITY_LEVEL_LOW  2
+#define POWER_SUPPLY_CAPACITY_LEVEL_NORMAL   3
+#define POWER_SUPPLY_CAPACITY_LEVEL_HIGH 4
+#define POWER_SUPPLY_CAPACITY_LEVEL_FULL 5


Should this be synthesized by the driver if the hardware gives only
quantitative values? If so, maybe provide some guidelines.



+enum power_supply_type {
+   POWER_SUPPLY_TYPE_BATTERY = 0,
+   POWER_SUPPLY_TYPE_UPS,
+   POWER_SUPPLY_TYPE_AC,
+   POWER_SUPPLY_TYPE_USB,
+};


How about dumb (non-USB) DC power? Any reason to distinguish it from AC?

 Shem
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree

2007-05-03 Thread Ken Chen


On 5/3/07, Paul Jackson <[EMAIL PROTECTED]> wrote:

Adding Christoph Lameter <[EMAIL PROTECTED]> to the cc list, as he knows
more about hugetlb pages than I do.

This patch strikes me as a bit odd.

Granted, it's solving what could be a touchy problem with a fairly
simple solution, which is usually a Good Thing(tm).

However, the idea that different tasks would see different values for
the following fields in /proc/meminfo:

HugePages_Total: 0
HugePages_Free:  0

strikes me as odd, and risky.  I would have thought that usually, all
tasks in the system should see the same values in the files in /proc
(as opposed to the files in particular task subdirectories /proc/.)

This patch strikes me as a bit of a hack, good for compatibility, but
hiding a booby trap that will bite some user code in the long run.

But I'm not enough of an expert to know what the right tradeoffs are
in this matter.


Would annotating the Hugepages_* field with name of cpuset help?  I
orginally thought that since cpuset's mems are hirearchical in memory
assignment, it is fairly straightforward to understand what's going
on: parent cpuset stats include its and all of its children.  For
example, if root cpuset has two sub job1 and job2 cpusets, each has 20
and 30 htlb pages, when query at each level, we have:

[EMAIL PROTECTED] echo $$ > /dev/cpuset/tasks
[EMAIL PROTECTED] grep HugePages_Total /proc/meminfo
HugePages_Total:50

[EMAIL PROTECTED] echo $$ > /dev/cpuset/job1/tasks
[EMAIL PROTECTED] grep HugePages_Total /proc/meminfo
HugePages_Total:20

[EMAIL PROTECTED] echo $$ > /dev/cpuset/job2/tasks
[EMAIL PROTECTED] grep HugePages_Total /proc/meminfo
HugePages_Total:30

If this is odd, do you have any suggestions for alternative?

- Ken
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Remove constructor from buffer_head

2007-05-03 Thread Andrew Morton

On Thu, 3 May 2007 20:34:48 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> 
wrote:

> On Thu, 3 May 2007, Andrew Morton wrote:
> 
> > On Thu, 3 May 2007 20:08:41 -0700 (PDT) Christoph Lameter <[EMAIL 
> > PROTECTED]> wrote:
> > 
> > > Performance tests show a slight improvements in netperf (not a
> > > strong case for a performance improvement but removing the
> > > constructor has definitely no negative impact so why keep
> > > this around?).
> > > 
> > > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost 
> > > (127.0.0.1) port 0 AF_INET
> > > Recv   SendSend
> > > Socket Socket  Message  Elapsed
> > > Size   SizeSize Time Throughput
> > > bytes  bytes   bytessecs.10^6bits/sec
> > > 
> > > Before:
> > >  87380  16384  1638410.016026.04
> > >  87380  16384  1638410.015992.17
> > >  87380  16384  1638410.016071.23
> > > 
> > > After:
> > >  87380  16384  1638410.016090.20
> > >  87380  16384  1638410.016078.3
> > >  87380  16384  1638410.006013.52
> > 
> > How could a filesystem change affect networking performance?
> > 
> > The change looks nice, but I'd microbenchmark it with a 
> > write-to-ext2-on-ramdisk
> > or something like that.
> 
> H.. I was told in another thread that this is the most frequently used 
> slab for this benchmark

That would be hair-raising ;)  I suspect confusion with sk_buff.

buffer_heads do get used quite a bit though.  A good microbenchmark would
be to sit in a tight loop extending and truncating an ext2 file

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Remove constructor from buffer_head

2007-05-03 Thread Christoph Lameter

On Thu, 3 May 2007, Andrew Morton wrote:

> The change looks nice, but I'd microbenchmark it with a 
> write-to-ext2-on-ramdisk
> or something like that.

Hmmm... How does one benchmark buffer head performance? Guess just by 
copying files? Not sure if the following will cut it.

Two tests. First copying 8M of small files into a 16M ramdisk:

for i in 1 2 3 4 5 6 7 8 9; do

mke2fs /dev/ram0 >/dev/null
mount /dev/ram0 /media >/dev/null
time cp -a /etc /media
umount /dev/ram0

done;


No constructor

real0m0.104s
user0m0.016s
sys 0m0.056s

real0m0.090s
user0m0.008s
sys 0m0.056s

real0m0.089s
user0m0.016s
sys 0m0.048s

real0m0.097s
user0m0.004s
sys 0m0.064s

real0m0.091s
user0m0.008s
sys 0m0.052s

real0m0.091s
user0m0.004s
sys 0m0.060s

real0m0.098s
user0m0.008s
sys 0m0.060s

real0m0.091s
user0m0.000s
sys 0m0.064s

real0m0.090s
user0m0.012s
sys 0m0.052s

W/constructor

real0m0.099s
user0m0.004s
sys 0m0.100s

real0m0.098s
user0m0.008s
sys 0m0.096s

real0m0.091s
user0m0.016s
sys 0m0.080s

real0m0.091s
user0m0.012s
sys 0m0.084s

real0m0.090s
user0m0.012s
sys 0m0.080s

real0m0.090s
user0m0.020s
sys 0m0.076s

real0m1.269s
user0m0.012s
sys 0m0.084s

real0m0.095s
user0m0.016s
sys 0m0.084s

real0m0.096s
user0m0.020s
sys 0m0.084s

The no constructor numbers are generally lower.
Lowest is no constructor with 0.089.

Second. Copy vmlinux (52M) to 128M ramdisk:

for i in 1 2 3 4 5 6 7 8 9; do

mke2fs /dev/ram0 >/dev/null
mount /dev/ram0 /media >/dev/null
time cp slub/vmlinux /media
umount /dev/ram0

done;


No constructor:

real0m2.095s
user0m0.000s
sys 0m0.168s

real0m0.187s
user0m0.008s
sys 0m0.124s

real0m0.186s
user0m0.008s
sys 0m0.120s

real0m0.195s
user0m0.008s
sys 0m0.128s

real0m0.177s
user0m0.004s
sys 0m0.120s

real0m0.182s
user0m0.004s
sys 0m0.120s

real0m0.186s
user0m0.008s
sys 0m0.120s

real0m0.190s
user0m0.004s
sys 0m0.128s

real0m0.174s
user0m0.004s
sys 0m0.116s


Constructor

real0m0.183s
user0m0.004s
sys 0m0.188s

real0m0.183s
user0m0.004s
sys 0m0.192s

real0m0.177s
user0m0.012s
sys 0m0.176s

real0m0.186s
user0m0.004s
sys 0m0.192s

real0m0.187s
user0m0.008s
sys 0m0.188s

real0m0.184s
user0m0.004s
sys 0m0.192s

real0m0.177s
user0m0.012s
sys 0m0.176s

real0m0.183s
user0m0.004s
sys 0m0.192s

real0m0.182s
user0m0.004s
sys 0m0.188s

Same here. Low is 0.174 no constructor.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/5] ext4: write support for preallocated blocks/extents

2007-05-03 Thread Andrew Morton

On Thu, 26 Apr 2007 23:46:23 +0530 "Amit K. Arora" <[EMAIL PROTECTED]> wrote:

> This patch adds write support for preallocated (using fallocate system
> call) blocks/extents. The preallocated extents in ext4 are marked
> "uninitialized", hence they need special handling especially while
> writing to them. This patch takes care of that.
> 
> ...
>
>  /*
> + * ext4_ext_try_to_merge:
> + * tries to merge the "ex" extent to the next extent in the tree.
> + * It always tries to merge towards right. If you want to merge towards
> + * left, pass "ex - 1" as argument instead of "ex".
> + * Returns 0 if the extents (ex and ex+1) were _not_ merged and returns
> + * 1 if they got merged.

OK.

> + */
> +int ext4_ext_try_to_merge(struct inode *inode,
> + struct ext4_ext_path *path,
> + struct ext4_extent *ex)
> +{
> + struct ext4_extent_header *eh;
> + unsigned int depth, len;
> + int merge_done=0, uninitialized = 0;

space around "=", please.

Many people prefer not to do the multiple-definitions-per-line, btw:

int merge_done = 0;
int uninitialized = 0;

reasons:

- If gives you some space for a nice comment

- It makes patches much more readable, and it makes rejects easier to fix

- standardisation.

> + depth = ext_depth(inode);
> + BUG_ON(path[depth].p_hdr == NULL);
> + eh = path[depth].p_hdr;
> +
> + while (ex < EXT_LAST_EXTENT(eh)) {
> + if (!ext4_can_extents_be_merged(inode, ex, ex + 1))
> + break;
> + /* merge with next extent! */
> + if (ext4_ext_is_uninitialized(ex))
> + uninitialized = 1;
> + ex->ee_len = cpu_to_le16(ext4_ext_get_actual_len(ex)
> + + ext4_ext_get_actual_len(ex + 1));
> + if (uninitialized)
> + ext4_ext_mark_uninitialized(ex);
> +
> + if (ex + 1 < EXT_LAST_EXTENT(eh)) {
> + len = (EXT_LAST_EXTENT(eh) - ex - 1)
> + * sizeof(struct ext4_extent);
> + memmove(ex + 1, ex + 2, len);
> + }
> + eh->eh_entries = cpu_to_le16(le16_to_cpu(eh->eh_entries)-1);

Kenrel convention is to put spaces around "-"

> + merge_done = 1;
> + BUG_ON(eh->eh_entries == 0);

eek, scary BUG_ON.  Do we really need to be that severe?  Would it be
better to warn and run ext4_error() here?

> + }
> +
> + return merge_done;
> +}
> +
> +
>
> ...
>
> +/*
> + * ext4_ext_convert_to_initialized:
> + * this function is called by ext4_ext_get_blocks() if someone tries to write
> + * to an uninitialized extent. It may result in splitting the uninitialized
> + * extent into multiple extents (upto three). Atleast one initialized extent
> + * and atmost two uninitialized extents can result.

There are some typos here

> + * There are three possibilities:
> + *   a> No split required: Entire extent should be initialized.
> + *   b> Split into two extents: Only one end of the extent is being written 
> to.
> + *   c> Split into three extents: Somone is writing in middle of the extent.

and here

> + */
> +int ext4_ext_convert_to_initialized(handle_t *handle, struct inode *inode,
> + struct ext4_ext_path *path,
> + ext4_fsblk_t iblock,
> + unsigned long max_blocks)
> +{
> + struct ext4_extent *ex, *ex1 = NULL, *ex2 = NULL, *ex3 = NULL, newex;
> + struct ext4_extent_header *eh;
> + unsigned int allocated, ee_block, ee_len, depth;
> + ext4_fsblk_t newblock;
> + int err = 0, ret = 0;
> +
> + depth = ext_depth(inode);
> + eh = path[depth].p_hdr;
> + ex = path[depth].p_ext;
> + ee_block = le32_to_cpu(ex->ee_block);
> + ee_len = ext4_ext_get_actual_len(ex);
> + allocated = ee_len - (iblock - ee_block);
> + newblock = iblock - ee_block + ext_pblock(ex);
> + ex2 = ex;
> +
> + /* ex1: ee_block to iblock - 1 : uninitialized */
> + if (iblock > ee_block) {
> + ex1 = ex;
> + ex1->ee_len = cpu_to_le16(iblock - ee_block);
> + ext4_ext_mark_uninitialized(ex1);
> + ex2 = &newex;
> + }
> + /* for sanity, update the length of the ex2 extent before
> +  * we insert ex3, if ex1 is NULL. This is to avoid temporary
> +  * overlap of blocks.
> +  */
> + if (!ex1 && allocated > max_blocks)
> + ex2->ee_len = cpu_to_le16(max_blocks);
> + /* ex3: to ee_block + ee_len : uninitialised */
> + if (allocated > max_blocks) {
> + unsigned int newdepth;
> + ex3 = &newex;
> + ex3->ee_block = cpu_to_le32(iblock + max_blocks);
> + ext4_ext_store_pblock(ex3, newblock + max_blocks);
> + ex3->ee_len = cpu_to_le16(allocated - max_blocks);
> + ext4_ext_mark_uni

Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-03 Thread Keith Packard

On Thu, 2007-05-03 at 01:01 +0200, Thomas Hellström wrote:

> It might be possible to find schemes that work around this. One way 
> could possibly be to have a buffer mapping -and validate order for 
> shared buffers.

If mapping never blocks on anything other than the fence, then there
isn't any dead lock possibility. What this says is that ordering of
rendering between clients is *not DRMs problem*. I think that's a good
solution though; I want to let multiple apps work on DRM-able memory
with their own CPU without contention.

I don't recall if Eric layed out the proposed rules, but:

 1) Map never blocks on map. Clients interested in dealing with this 
are on their own.

 2) Submit blocks on map. You must unmap all buffers before submitting
them. Doing the relocations in the kernel makes this all possible.

 3) Map blocks on the fence from submit. We can play with pending the
flush until the app asks for the buffer back, or we can play with
figuring out when flushes are useful automatically. Doesn't matter
if the policy is in the kernel.

I'm interested in making deadlock avoidence trivial and eliminating any
map-map contention.

-- 
[EMAIL PROTECTED]

signature.asc
Description: This is a digitally signed message part

Re: [PATCH 4/5] ext4: fallocate support in ext4

2007-05-03 Thread Andrew Morton

On Thu, 26 Apr 2007 23:43:32 +0530 "Amit K. Arora" <[EMAIL PROTECTED]> wrote:

> This patch has the ext4 implemtation of fallocate system call.
> 
> ...
>
> + /* ext4_can_extents_be_merged should have checked that either
> +  * both extents are uninitialized, or both aren't. Thus we
> +  * need to check only one of them here.
> +  */

Please always format multiline comments like this:

/*
 * ext4_can_extents_be_merged should have checked that either
 * both extents are uninitialized, or both aren't. Thus we
 * need to check only one of them here.
 */

> ...
>
> +/*
> + * ext4_fallocate:
> + * preallocate space for a file
> + * mode is for future use, e.g. for unallocating preallocated blocks etc.
> + */

This description is rather thin.  What is the filesystem's actual behaviour
here?  If the file is using extents then the implementation will do
.  If the file is using bitmaps then we will do .

But what?   Here is where it should be described.

> +int ext4_fallocate(struct inode *inode, int mode, loff_t offset, loff_t len)
> +{
> + handle_t *handle;
> + ext4_fsblk_t block, max_blocks;
> + int ret, ret2, nblocks = 0, retries = 0;
> + struct buffer_head map_bh;
> + unsigned int credits, blkbits = inode->i_blkbits;
> +
> + /* Currently supporting (pre)allocate mode _only_ */
> + if (mode != FA_ALLOCATE)
> + return -EOPNOTSUPP;
> +
> + if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
> + return -ENOTTY;

So we don't implement fallocate on bitmap-based files!  Well that's huge
news.  The changelog would be an appropriate place to communicate this,
along with reasons why, or a description of the plan to fix it.

Also, posix says nothing about fallocate() returning ENOTTY.

> + block = offset >> blkbits;
> + max_blocks = (EXT4_BLOCK_ALIGN(len + offset, blkbits) >> blkbits)
> +  - block;
> + mutex_lock(&EXT4_I(inode)->truncate_mutex);
> + credits = ext4_ext_calc_credits_for_insert(inode, NULL);
> + mutex_unlock(&EXT4_I(inode)->truncate_mutex);

Now I'm mystified.  Given that we're allocating an arbitrary amount of disk
space, and that this disk space will require an arbitrary amount of
metadata, how can we work out how much journal space we'll be needing
without at least looking at `len'?

> + handle=ext4_journal_start(inode, credits +

Please always put spaces around "="

> + EXT4_DATA_TRANS_BLOCKS(inode->i_sb)+1);

And around "+"

> + if (IS_ERR(handle))
> + return PTR_ERR(handle);
> +retry:
> + ret = 0;
> + while (ret >= 0 && ret < max_blocks) {
> + block = block + ret;
> + max_blocks = max_blocks - ret;
> + ret = ext4_ext_get_blocks(handle, inode, block,
> +   max_blocks, &map_bh,
> +   EXT4_CREATE_UNINITIALIZED_EXT, 0);
> + BUG_ON(!ret);

BUG_ON is vicious.  Is it really justified here?  Possibly a WARN_ON and
ext4_error() would be safer and more useful here.

> + if (ret > 0 && test_bit(BH_New, &map_bh.b_state)

Use buffer_new() here.   A separate patch which fixes the three existing
instances of open-coded BH_foo usage would be appreciated.

> + && ((block + ret) > (i_size_read(inode) << blkbits)))

Check for wrap though the sign bit and through zero please.

> + nblocks = nblocks + ret;
> + }
> +
> + if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
> + goto retry;
> +
> + /* Time to update the file size.
> +  * Update only when preallocation was requested beyond the file size.
> +  */

Fix comment layout.

> + if ((offset + len) > i_size_read(inode)) {

Both the lhs and the rhs here are signed.  Please review for possible
overflows through the sign bit and through zero.  Perhaps a comment
explaining why it's correct would be appropriate.

> + if (ret > 0) {
> + /* if no error, we assume preallocation succeeded completely */
> + mutex_lock(&inode->i_mutex);
> + i_size_write(inode, offset + len);
> + EXT4_I(inode)->i_disksize = i_size_read(inode);
> + mutex_unlock(&inode->i_mutex);
> + } else if (ret < 0 && nblocks) {
> + /* Handle partial allocation scenario */

The above two comments should be indented one additional tabstop.

> + loff_t newsize;
> + mutex_lock(&inode->i_mutex);
> + newsize  = (nblocks << blkbits) + i_size_read(inode);
> + i_size_write(inode, EXT4_BLOCK_ALIGN(newsize, blkbits));
> + EXT4_I(inode)->i_disksize = i_size_read(inode);
> + mut

Re: [PATCH 3/5] ext4: Extent overlap bugfix

2007-05-03 Thread Andrew Morton

On Thu, 26 Apr 2007 23:41:01 +0530 "Amit K. Arora" <[EMAIL PROTECTED]> wrote:

> +unsigned int ext4_ext_check_overlap(struct inode *inode,
> + struct ext4_extent *newext,
> + struct ext4_ext_path *path)
> +{
> + unsigned long b1, b2;
> + unsigned int depth, len1;
> +
> + b1 = le32_to_cpu(newext->ee_block);
> + len1 = le16_to_cpu(newext->ee_len);
> + depth = ext_depth(inode);
> + if (!path[depth].p_ext)
> + goto out;
> + b2 = le32_to_cpu(path[depth].p_ext->ee_block);
> +
> + /* get the next allocated block if the extent in the path
> +  * is before the requested block(s) */
> + if (b2 < b1) {
> + b2 = ext4_ext_next_allocated_block(path);
> + if (b2 == EXT_MAX_BLOCK)
> + goto out;
> + }
> +
> + if (b1 + len1 > b2) {

Are we sure that b1+len cannot wrap through zero here?

> + newext->ee_len = cpu_to_le16(b2 - b1);
> + return 1;
> + }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/5] fallocate() implementation in i86, x86_64 and powerpc

2007-05-03 Thread Andrew Morton

On Thu, 26 Apr 2007 23:33:32 +0530 "Amit K. Arora" <[EMAIL PROTECTED]> wrote:

> This patch implements the fallocate() system call and adds support for
> i386, x86_64 and powerpc.
> 
> ...
>
> +asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len)

Please add a comment over this function which specifies its behaviour. 
Really it should be enough material from which a full manpage can be
written.

If that's all too much, this material should at least be spelled out in the
changelog.  Because there's no way in which this change can be fully
reviewed unless someone (ie: you) tells us what it is setting out to
achieve.

If we 100% implement some standard then a URL for what we claim to
implement would suffice.  Given that we're at least using different types from
posix I doubt if such a thing would be sufficient.

And given the complexity and potential variability within the filesystem
implementations of this, I'd expect that _something_ additional needs to be
said?

> +{
> + struct file *file;
> + struct inode *inode;
> + long ret = -EINVAL;
> +
> + if (len == 0 || offset < 0)
> + goto out;

The posix spec implies that negative `len' is permitted - presumably "allocate
ahead of `offset'".  How peculiar.

> + ret = -EBADF;
> + file = fget(fd);
> + if (!file)
> + goto out;
> + if (!(file->f_mode & FMODE_WRITE))
> + goto out_fput;
> +
> + inode = file->f_path.dentry->d_inode;
> +
> + ret = -ESPIPE;
> + if (S_ISFIFO(inode->i_mode))
> + goto out_fput;
> +
> + ret = -ENODEV;
> + if (!S_ISREG(inode->i_mode))
> + goto out_fput;

So we return ENODEV against an S_ISBLK fd, as per the posix spec.  That
seems a bit silly of them.

> + ret = -EFBIG;
> + if (offset + len > inode->i_sb->s_maxbytes)
> + goto out_fput;

This code does handle offset+len going negative, but only by accident, I
suspect.  It happens that s_maxbytes has unsigned type.  Perhaps a comment
here would settle the reader's mind.

> + if (inode->i_op && inode->i_op->fallocate)
> + ret = inode->i_op->fallocate(inode, mode, offset, len);
> + else
> + ret = -ENOSYS;

If we _are_ going to support negative `len', as posix suggests, I think we
should perform the appropriate sanity conversions to `offset' and `len'
right here, rather than expecting each filesystem to do it.

If we're not going to handle negative `len' then we should check for it.

> +out_fput:
> + fput(file);
> +out:
> + return ret;
> +}
> +EXPORT_SYMBOL(sys_fallocate);

I don't believe this needs to be exported to modules?

> +/*
> + * fallocate() modes
> + */
> +#define FA_ALLOCATE  0x1
> +#define FA_DEALLOCATE0x2

Now those aren't in posix.  They should be documented, along with their
expected semantics.

>  #ifdef __KERNEL__
>  
>  #include 
> @@ -1125,6 +1131,7 @@ struct inode_operations {
>   ssize_t (*listxattr) (struct dentry *, char *, size_t);
>   int (*removexattr) (struct dentry *, const char *);
>   void (*truncate_range)(struct inode *, loff_t, loff_t);
> + long (*fallocate)(struct inode *, int, loff_t, loff_t);

I really do think it's better to put the variable names in definitions such
as this.  Especially when we have two identically-typed variables next to
each other like that.  Quick: which one is the offset and which is the
length?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RHEL 3

2007-05-03 Thread Majumder, Rajib

Hi,

I am wondering if RHEL 3 (based on 2.4.21 kernel but RH claims they backported 
lot of 2.6 kernel's feature into it) supports Multi-Core and Hyperthreaded 
CPUs. 

Is the CPU-scheduler multi-core/hyperthreading aware? Is it aware ccNUMA 
multi-core CPU? 

Any input is appreciated. 

Thanks

Rajib


==
Please access the attached hyperlink for an important electronic communications 
disclaimer: 

http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
==

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22 -mm merge plans -- vm bugfixes

2007-05-03 Thread Nick Piggin


Andrew Morton wrote:

On Thu, 03 May 2007 11:32:23 +1000 Nick Piggin <[EMAIL PROTECTED]> wrote:



void fastcall unlock_page(struct page *page)
{
+   VM_BUG_ON(!PageLocked(page));
smp_mb__before_clear_bit();
-   if (!TestClearPageLocked(page))
-   BUG();
-	smp_mb__after_clear_bit(); 
-	wake_up_page(page, PG_locked);

+   ClearPageLocked(page);
+   if (unlikely(test_bit(PG_waiters, &page->flags))) {
+   clear_bit(PG_waiters, &page->flags);
+   wake_up_page(page, PG_locked);
+   }
}



Why is that significantly faster than plain old wake_up_page(), which
tests waitqueue_active()?


Because it needs fewer barriers and doesn't touch random a random hash
cacheline in the fastpath.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: console font limits

2007-05-03 Thread Daniel Hazelton

On Thursday 03 May 2007 20:39:05 H. Peter Anvin wrote:
> Kyle Moffett wrote:
> > Actually I think the real problem was that "KD_GRAPHICS" got overloaded
> > to mean "some userspace program is probably poking at the GPU in very
> > direct ways possibly including /dev/mem".  As such it really isn't safe
> > at all for the kernel to write stuff to the screen in that situation;
> > you could turn a panic()+reboot-after-30-secs into an unrecoverable hard
> > PCI bus lockup.  IIRC there were at least a couple chipsets which had
> > that problem with X.  If we can implement enough APIs for X to do all of
> > its stuff from userspace without iopl() or /dev/mem then we could
> > probably bring back the option for dumping oopses to screen in
> > KD_GRAPHICS mode, but otherwise it'll just cause more headaches.
>
> It never meant anything *BUT* that, to the best of my knowledge.  That
> was certainly the original meaning of KD_GRAPHICS.

I started work last year on making the framebuffer layer use the DRM internals 
for all controls, providing a unified kernel and userspace system for 
accessing the graphics devices. It never got anywhere because I couldn't 
figure out a simple system for figuring out which driver (out of the numerous 
ones that could potentially be compiled into the kernel) to actually give 
control to. (I know I could have just looped over them all and figured it out 
that way, but that is far from elegant)

I guess I could start on that work again - shouldn't take me all that long to 
recover the stuff I lost when a blackout caused my hard drive to get 
corrupted beyond recovery (and the automated journal replay didn't do a 
damned thing - I think it actually *added* to the corruption, but I don't 
think any filesystem would have survived that)

DRH
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

how can I debug to debug kernel pointer error?

2007-05-03 Thread ye janboe


Hi all!
I met a issue that some code changing one process preempt_count.
preempt_count is changed to a very large number, for instant, 0x300,
just before finish_schedule function in schedule.

Who can give me some suggestion to debug such problem?

Thanks very much!

Janboe
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Routing 600+ vlan's via linux problems (looks like arp problems)

2007-05-03 Thread Øyvind Vågen Jægtnes

Hi again :)

On 5/4/07, Willy Tarreau <[EMAIL PROTECTED]> wrote:

On Thu, May 03, 2007 at 11:12:09PM +0200, Øyvind Vågen Jægtnes wrote:
> On 5/3/07, Jan Engelhardt <[EMAIL PROTECTED]> wrote:
> >
> >On May 3 2007 22:53, Willy Tarreau wrote:
> >>> For the rest all we see in the arp cache is (incomplete)
> >>
> >>I suspect that your arp cache is full (128 entries by default).
> >>Check /proc/sys/net/ipv4/neigh/gc_thresh1 (128 for me). You can
> >>set it as high as gc_thresh2 (512 for me), and I don't know what
> >>happens above.
> >
> >Above, you will perhaps need the not-so-elegant userspace arpd :-/
>
> Yes, i was suspecting that the arp cache got full, but i will try
> increasing it :)
> Would there be any huge bugs if i change these lines in arp.c:
>
>.gc_thresh1 =   128,
>.gc_thresh2 =   512,
>
> to
>
>.gc_thresh1 =   700,
>.gc_thresh2 =   700,
>
> under the definition for struct arp_tbl?

I don't think it could cause a problem, but network people will surely
correct me if I'm wrong.

System is up and running perfectly now, it is routing everything at
about 200 mbps now with only 5% load avg with the above changes to
arp.c

So the real question now is, why is this number so low by default?
It would probably be much better if this could be handled dynamically
in the kernel.

> This setup will only run for about 1-2 hours while we fix the hardware
> router (it is running now, but only on a backup flash card solution.
> the harddrive in it died ;)

Huhhh! Please tell us exactly what make and model of ROUTER you are using
which embeds a HARD DRIVE, so that we recall never to buy that ! Having
seen uptimes of 5 years on moderately big access routers, I would have
find it awful to see them die multiple times in that timeframe because
of a crappy IDE drive inside !

Its a Juniper M7i
It comes default with a 5400 rpm laptop 2.5" harddrive but now we
bought a more robust "server" 2.5" harddrive. It still barfs on the OS
install, so the linux is doing all the job now. Will get a juniper guy
to come and fix :)

As a side note, i'm starting to wonder if it was worth the $20k when i
could just have a linux machine to do the job with a clone for backup
;)

regards
Øyvind Vågen Jægtnes
+47 96 22 03 08
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Serial 8250: clear the lsr_break_flag at open

2007-05-03 Thread Corey Minyard


Russell King wrote:


The backup code is something I never properly reviewed, so no comments
there.  The tx_empty code I assumed would be a relatively rare event,
except when closing the port (at which point you don't particularly care
about errors anyway, not even the break flag since chances are you'll
miss the following character.)
  

That "if" statement in the backup code does look a little dodgy, more
than is perhaps required.  I think it's correct, but I need to add
a lock there in my patch to protect the LSR check.

Given that people might want to poll it for various reasons, I guess
saving the status away should be done.  However, there's a slight issue
with working out which character the error is associated with.  Careful
locking may be the answer to that though.
  

I think as long as you hold the port lock while you grab the LSR and
set the saved flags it will work.

As for start_tx, yes, though slightly harder to check.  Maybe the code
should be modified to reduce the number of potential LSR reads by reading
the IIR first, and only if that shows no interrupt pending should the LSR
be read (and the error flags remembered.)
  

The version of start_tx in 2.6.21 does check IIR first, and it only
checks the LSR if UART_BUG_TXEN is set, so I assume that's not
a big deal.

I'll sleep on it tonight, look it over tomorrow morning, and resend the
patch.

Thanks,

-corey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RELEASE] Lguest for 2.6.21

2007-05-03 Thread Rusty Russell

On Thu, 2007-05-03 at 22:20 -0500, Matt Mackall wrote:
> I take it both sides of the virtual device drivers are turned on by
> the lguest option?

Yeah, to quote the code in drivers/lguest/lguest_bus.c:

/* At the moment we build all the drivers into the kernel because they're so
 * simple: 8144 bytes for all three of them as I type this.  And as the console
 * really needs to be built in, it's actually only 3527 bytes for the network
 * and block drivers.

> For the purposes of kernel hacking, I'd want to boot into one build
> and repeatedly launch another build as a guest, thereby getting
> faster hack/build/test cycles than either qemu or full reboot.
> How tightly coupled are things here?

I do that all the time, too.  The main issue is that we provide no ABI
for lguest (at least, not yet), so if you actually change guest/host
kernel version, you're on your own...

Thanks!
Rusty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Remove constructor from buffer_head

2007-05-03 Thread Christoph Lameter

On Thu, 3 May 2007, Andrew Morton wrote:

> On Thu, 3 May 2007 20:08:41 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> 
> wrote:
> 
> > Performance tests show a slight improvements in netperf (not a
> > strong case for a performance improvement but removing the
> > constructor has definitely no negative impact so why keep
> > this around?).
> > 
> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost 
> > (127.0.0.1) port 0 AF_INET
> > Recv   SendSend
> > Socket Socket  Message  Elapsed
> > Size   SizeSize Time Throughput
> > bytes  bytes   bytessecs.10^6bits/sec
> > 
> > Before:
> >  87380  16384  1638410.016026.04
> >  87380  16384  1638410.015992.17
> >  87380  16384  1638410.016071.23
> > 
> > After:
> >  87380  16384  1638410.016090.20
> >  87380  16384  1638410.016078.3
> >  87380  16384  1638410.006013.52
> 
> How could a filesystem change affect networking performance?
> 
> The change looks nice, but I'd microbenchmark it with a 
> write-to-ext2-on-ramdisk
> or something like that.

H.. I was told in another thread that this is the most frequently used 
slab for this benchmark .. Just accepted that as true.
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] MM: use DIV_ROUND_UP() in mm/memory.c

2007-05-03 Thread Andrew Morton

On Tue, 24 Apr 2007 16:10:22 +0200 Rolf Eike Beer <[EMAIL PROTECTED]> wrote:

> This should make no difference in behaviour.
> 
> Signed-off-by: Rolf Eike Beer <[EMAIL PROTECTED]>
> 
> ---
> commit 64aa7c3136258d3abc76354b5f83b9a9575169c0
> tree 8037adc04b57cd6150456399b7caccf99489385a
> parent bf0bd376f79cadb4f8cd454db1723eb9be0aabc1
> author Rolf Eike Beer <[EMAIL PROTECTED]> Tue, 24 Apr 2007 16:05:40 +0200
> committer Rolf Eike Beer <[EMAIL PROTECTED]> Tue, 24 Apr 2007 16:05:40 
> +0200
> 
>  mm/memory.c |7 +++
>  1 files changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index e7066e7..45bba1f 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1838,12 +1838,11 @@ void unmap_mapping_range(struct address_space 
> *mapping,
>  {
>   struct zap_details details;
>   pgoff_t hba = holebegin >> PAGE_SHIFT;
> - pgoff_t hlen = (holelen + PAGE_SIZE - 1) >> PAGE_SHIFT;
> + pgoff_t hlen = DIV_ROUND_UP(holelen, PAGE_SIZE);
>  
>   /* Check for overflow. */
>   if (sizeof(holelen) > sizeof(hlen)) {
> - long long holeend =
> - (holebegin + holelen + PAGE_SIZE - 1) >> PAGE_SHIFT;
> + long long holeend = DIV_ROUND_UP(holebegin + holelen, 
> PAGE_SIZE);
>   if (holeend & ~(long long)ULONG_MAX)
>   hlen = ULONG_MAX - hba + 1;
>   }
> @@ -2592,7 +2591,7 @@ int make_pages_present(unsigned long addr, unsigned 
> long 
> end)
>   write = (vma->vm_flags & VM_WRITE) != 0;
>   BUG_ON(addr >= end);
>   BUG_ON(end > vma->vm_end);
> - len = (end+PAGE_SIZE-1)/PAGE_SIZE-addr/PAGE_SIZE;
> + len = DIV_ROUND_UP(end, PAGE_SIZE) - addr/PAGE_SIZE;
>   ret = get_user_pages(current, current->mm, addr,
>   len, write, 0, NULL, NULL);
>   if (ret < 0)

The patch is wordwrapped.  Please fix your MUA.

More seriously, on i386:

   textdata bss dec hex filename
  15509  27  28   155643ccc mm/memory.o (before)
  15561  27  28   156163d00 mm/memory.o (after)

I'm not sure why - some of the quantities which we're dividing by there are
64-bit and perhaps the compiler has decided not to do shifting.

Please always check the before-and-after .text size from now on?

Now I'm worried about all the other DIV_ROUND_UP() conversions we did.  We
should get in there and work out why it went bad.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Remove constructor from buffer_head

2007-05-03 Thread Andrew Morton

On Thu, 3 May 2007 20:08:41 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> 
wrote:

> Performance tests show a slight improvements in netperf (not a
> strong case for a performance improvement but removing the
> constructor has definitely no negative impact so why keep
> this around?).
> 
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost 
> (127.0.0.1) port 0 AF_INET
> Recv   SendSend
> Socket Socket  Message  Elapsed
> Size   SizeSize Time Throughput
> bytes  bytes   bytessecs.10^6bits/sec
> 
> Before:
>  87380  16384  1638410.016026.04
>  87380  16384  1638410.015992.17
>  87380  16384  1638410.016071.23
> 
> After:
>  87380  16384  1638410.016090.20
>  87380  16384  1638410.016078.3
>  87380  16384  1638410.006013.52

How could a filesystem change affect networking performance?

The change looks nice, but I'd microbenchmark it with a write-to-ext2-on-ramdisk
or something like that.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RELEASE] Lguest for 2.6.21

2007-05-03 Thread Matt Mackall

On Fri, May 04, 2007 at 10:43:09AM +1000, Rusty Russell wrote:
> On Fri, 2007-05-04 at 10:13 +1000, Rusty Russell wrote:
> > On Thu, 2007-05-03 at 11:02 -0500, Matt Mackall wrote:
> > > On Thu, May 03, 2007 at 12:43:48AM +1000, Rusty Russell wrote:
> > > > http://lguest.ozlabs.org/lguest-2.6.21-254.patch.gz
> > > > 
> > > > See Documentation/lguest/lguest.txt for how to run,
> > > > drivers/lguest/README for the draft code documentation journey.
> > > 
> > > Your lguest readme is quite lacking in the area of how to configure a
> > > guest kernel as opposed to the host kernel. More hand-holding, please.
> > 
> > Hi Matt!
> > 
> > Ah, that's because they are the same kernel.  Turning on CONFIG_LGUEST
> > builds-in the parts needed to be a guest as well.

Ok, I thought that might be a possibility.
 
> -- You will need to configure your kernel with the following options:
> +- Lguest runs the same kernel as guest and host.  You can configure
> +  them differently, but usually it's easiest not to.

I take it both sides of the virtual device drivers are turned on by
the lguest option?

For the purposes of kernel hacking, I'd want to boot into one build
and repeatedly launch another build as a guest, thereby getting
faster hack/build/test cycles than either qemu or full reboot.
How tightly coupled are things here?

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Remove constructor from buffer_head

2007-05-03 Thread Christoph Lameter

Performance tests show a slight improvements in netperf (not a
strong case for a performance improvement but removing the
constructor has definitely no negative impact so why keep
this around?).

TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) 
port 0 AF_INET
Recv   SendSend
Socket Socket  Message  Elapsed
Size   SizeSize Time Throughput
bytes  bytes   bytessecs.10^6bits/sec

Before:
 87380  16384  1638410.016026.04
 87380  16384  1638410.015992.17
 87380  16384  1638410.016071.23

After:
 87380  16384  1638410.016090.20
 87380  16384  1638410.016078.3
 87380  16384  1638410.006013.52


Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 fs/buffer.c |   22 --
 1 file changed, 4 insertions(+), 18 deletions(-)

Index: slub/fs/buffer.c
===
--- slub.orig/fs/buffer.c   2007-05-03 19:17:09.0 -0700
+++ slub/fs/buffer.c2007-05-03 19:57:30.0 -0700
@@ -2907,9 +2907,10 @@ static void recalc_bh_state(void)

 struct buffer_head *alloc_buffer_head(gfp_t gfp_flags)
 {
-   struct buffer_head *ret = kmem_cache_alloc(bh_cachep,
+   struct buffer_head *ret = kmem_cache_zalloc(bh_cachep,
set_migrateflags(gfp_flags, __GFP_RECLAIMABLE));
if (ret) {
+   INIT_LIST_HEAD(&ret->b_assoc_buffers);
get_cpu_var(bh_accounting).nr++;
recalc_bh_state();
put_cpu_var(bh_accounting);
@@ -2928,17 +2929,6 @@ void free_buffer_head(struct buffer_head
 }
 EXPORT_SYMBOL(free_buffer_head);
 
-static void
-init_buffer_head(void *data, struct kmem_cache *cachep, unsigned long flags)
-{
-   if (flags & SLAB_CTOR_CONSTRUCTOR) {
-   struct buffer_head * bh = (struct buffer_head *)data;
-
-   memset(bh, 0, sizeof(*bh));
-   INIT_LIST_HEAD(&bh->b_assoc_buffers);
-   }
-}
-
 static void buffer_exit_cpu(int cpu)
 {
int i;
@@ -2965,12 +2955,8 @@ void __init buffer_init(void)
 {
int nrpages;
 
-   bh_cachep = kmem_cache_create("buffer_head",
-   sizeof(struct buffer_head), 0,
-   (SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|
-   SLAB_MEM_SPREAD),
-   init_buffer_head,
-   NULL);
+   bh_cachep = KMEM_CACHE(buffer_head,
+   SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD);
 
/*
 * Limit the bh occupancy to 10% of ZONE_NORMAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: Regression with SLUB on Netperf and Volanomark

2007-05-03 Thread Christoph Lameter

H... I do not see a regression (up to date slub with all outstanding 
patches applied). This is without any options enabled (but antifrag 
patches are present so slub_max_order=4 slub_min_objects=16) Could you 
post a .config? Missing patches against 2.6.21-rc7-mm2 can be found at 
http://ftp.kernel.org/pub/linux/kernel/peopl/christoph/slub-patches

slab

TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost 
(127.0.0.1) port 0 AF_INET
Recv   SendSend
Socket Socket  Message  Elapsed
Size   SizeSize Time Throughput
bytes  bytes   bytessecs.10^6bits/sec

 87380  16384  1638410.016068.61
 87380  16384  1638410.015877.91
 87380  16384  1638410.015835.68
 87380  16384  1638410.015840.58

slub

TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) 
port 0 AF_INET
Recv   SendSend
Socket Socket  Message  Elapsed
Size   SizeSize Time Throughput
bytes  bytes   bytessecs.10^6bits/sec

 87380  16384  1638410.535646.53
 87380  16384  1638410.016073.09
 87380  16384  1638410.016094.68
 87380  16384  1638410.016088.50


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Fw: [BUG 2.6.21-rc7] acpi_pm clocksource loses time on x86-64

2007-05-03 Thread john stultz

On Wed, 2007-05-02 at 11:10 -0700, john stultz wrote:
> On Sun, 2007-04-29 at 17:24 +0200, Mikael Pettersson wrote:
> > On Thu, 26 Apr 2007 15:42:44 -0700, john stultz wrote:
> > >Another shot in the dark:
> > >
> > >I wonder if the ACPI PM counter is halting in idle. Does booting w/
> > >idle=poll change the behavior? (Please do this while your laptop is
> > >plugged in, as it will run the cpu at full speed all the time).
> > 
> > Bingo!
> 
> Awesome! Finally, some progress! Thanks again for putting up w/ all my
> testing requests.
> 
> > I booted the x86-64 2.6.21 final kernel with idle=poll and let the
> > laptop idle for an hour. The ondemand cpufreq governor did reduce
> > the CPU's clock frequency, but that shouldn't have affected the
> > chipset or the ACPI PM counter.
> > 
> > Anyway, after 60 minutes `date' and `hwclock' were still in perfect
> > sync and matched actual time.
> > 
> > Any ideas why this halting in idle doesn't happen with the 32-bit kernel?
> 
> No clue. Time to ask Len. :)
> 
> Hey Len,
>   So that slow acpi_pm on x86_64 seems to be connected w/ the idle loop.
> I'm guessing the chipset halts the ACPI PM in lower C states. Do you
> have any guesses as to what might differ between x86_64 and i386 ACPI
> idle loops? Or might this be something different in what the BIOS
> exports in x86_64 mode or i386 mode?

Mikael,
Just trying to dig a bit more through the acpi_processor_idle code.
Could you run "cat /proc/acpi/processor/CPU1/power" and reply w/ the
output?

thanks
-john




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Stop ignoring argument in drivers/network/b44.c

2007-05-03 Thread Matthew Martin


This patch uses the phy_id variable in b44_readphy and b44_writephy.

Signed-off-by: Matthew Martin <[EMAIL PROTECTED]>
---

--- vanilla-linux-2.6.21-git4/drivers/net/b44.c 2007-05-03 11:16:21.0 
-0500
+++ linux-2.6.21-git4/drivers/net/b44.c 2007-05-03 17:02:39.0 -0500
@@ -327,45 +327,59 @@ static void b44_enable_ints(struct b44 *
bw32(bp, B44_IMASK, bp->imask);
 }
 
-static int b44_readphy(struct b44 *bp, int reg, u32 *val)
+static int b44_readphy(struct b44 *bp, int reg, u32 *val, int phy_addr)
 {
int err;
 
bw32(bp, B44_EMAC_ISTAT, EMAC_INT_MII);
-   bw32(bp, B44_MDIO_DATA, (MDIO_DATA_SB_START |
-(MDIO_OP_READ << MDIO_DATA_OP_SHIFT) |
-(bp->phy_addr << MDIO_DATA_PMD_SHIFT) |
-(reg << MDIO_DATA_RA_SHIFT) |
-(MDIO_TA_VALID << MDIO_DATA_TA_SHIFT)));
+   
+   if (!phy_addr)  
+   bw32(bp, B44_MDIO_DATA, (MDIO_DATA_SB_START |
+(MDIO_OP_READ << MDIO_DATA_OP_SHIFT) |
+(bp->phy_addr << MDIO_DATA_PMD_SHIFT) |
+(reg << MDIO_DATA_RA_SHIFT) |
+(MDIO_TA_VALID << MDIO_DATA_TA_SHIFT)));
+   else
+   bw32(bp, B44_MDIO_DATA, (MDIO_DATA_SB_START |
+(MDIO_OP_READ << MDIO_DATA_OP_SHIFT) |
+(phy_addr << MDIO_DATA_PMD_SHIFT) |
+(reg << MDIO_DATA_RA_SHIFT) |
+(MDIO_TA_VALID << MDIO_DATA_TA_SHIFT)));
+
err = b44_wait_bit(bp, B44_EMAC_ISTAT, EMAC_INT_MII, 100, 0);
*val = br32(bp, B44_MDIO_DATA) & MDIO_DATA_DATA;
 
return err;
 }
 
-static int b44_writephy(struct b44 *bp, int reg, u32 val)
+static int b44_writephy(struct b44 *bp, int reg, u32 val, int phy_addr)
 {
bw32(bp, B44_EMAC_ISTAT, EMAC_INT_MII);
-   bw32(bp, B44_MDIO_DATA, (MDIO_DATA_SB_START |
-(MDIO_OP_WRITE << MDIO_DATA_OP_SHIFT) |
-(bp->phy_addr << MDIO_DATA_PMD_SHIFT) |
-(reg << MDIO_DATA_RA_SHIFT) |
-(MDIO_TA_VALID << MDIO_DATA_TA_SHIFT) |
-(val & MDIO_DATA_DATA)));
+   
+   if (!phy_addr)  
+   bw32(bp, B44_MDIO_DATA, (MDIO_DATA_SB_START |
+(MDIO_OP_WRITE << MDIO_DATA_OP_SHIFT) |
+(bp->phy_addr << MDIO_DATA_PMD_SHIFT) |
+(reg << MDIO_DATA_RA_SHIFT) |
+(MDIO_TA_VALID << MDIO_DATA_TA_SHIFT) |
+(val & MDIO_DATA_DATA)));
+   else
+   bw32(bp, B44_MDIO_DATA, (MDIO_DATA_SB_START |
+(MDIO_OP_WRITE << MDIO_DATA_OP_SHIFT) |
+(phy_addr << MDIO_DATA_PMD_SHIFT) |
+(reg << MDIO_DATA_RA_SHIFT) |
+(MDIO_TA_VALID << MDIO_DATA_TA_SHIFT) |
+(val & MDIO_DATA_DATA)));
+
return b44_wait_bit(bp, B44_EMAC_ISTAT, EMAC_INT_MII, 100, 0);
 }
 
 /* miilib interface */
-/* FIXME FIXME: phy_id is ignored, bp->phy_addr use is unconditional
- * due to code existing before miilib use was added to this driver.
- * Someone should remove this artificial driver limitation in
- * b44_{read,write}phy.  bp->phy_addr itself is fine (and needed).
- */
 static int b44_mii_read(struct net_device *dev, int phy_id, int location)
 {
u32 val;
struct b44 *bp = netdev_priv(dev);
-   int rc = b44_readphy(bp, location, &val);
+   int rc = b44_readphy(bp, location, &val, phy_id);
if (rc)
return 0x;
return val;
@@ -375,7 +389,7 @@ static void b44_mii_write(struct net_dev
 int val)
 {
struct b44 *bp = netdev_priv(dev);
-   b44_writephy(bp, location, val);
+   b44_writephy(bp, location, val, phy_id);
 }
 
 static int b44_phy_reset(struct b44 *bp)
@@ -383,11 +397,11 @@ static int b44_phy_reset(struct b44 *bp)
u32 val;
int err;
 
-   err = b44_writephy(bp, MII_BMCR, BMCR_RESET);
+   err = b44_writephy(bp, MII_BMCR, BMCR_RESET, 0);
if (err)
return err;
udelay(100);
-   err = b44_readphy(bp, MII_BMCR, &val);
+   err = b44_readphy(bp, MII_BMCR, &val, 0);
if (!err) {
if (val & BMCR_RESET) {
printk(KERN_ERR PFX "%s: PHY Reset would not 
complete.\n",
@@ -446,15 +460,15 @@ static int b44_setup_phy(struct b44 *bp)
u32 val;
int err;
 
-   if ((err = b44_readphy(bp, B44_MII_ALEDCTRL, &val)) != 0)
+   if ((err = b44_readphy(bp, B44_MII_ALEDCTRL, &val, 0))

Re: [PATCH] tty add compat_ioctl

2007-05-03 Thread Paul Fulghum


Paul Fulghum wrote:

Arnd Bergmann wrote:
- In your driver you don't get the big kernel lock in the 
compat_ioctl function. I assume that this is correct for

the particular driver, but it may be nice if you could
consequently also add an unlocked_ioctl function that can
be used without the BKL for native ioctls. It would be good
to hear an opinon on this from someone who has an insight
in tty locking issues though, so I'm Cc:ing some people
who have touched that recently.


I don't count on higher level locking for
synchronization issues specific to the driver.

I thought the current compat_ioctl() was already
meant to *not* have the BKL just like unlocked_ioctl.
My thought was that any driver getting a recent update
like compat_ioctl() would need to be reviewed for BKL
safety and take the lock manually if necessary.


Nevermind. I misread what you wrote (I'm tired).
Yes, adding an unlocked_ioctl() makes sense.

--
Paul

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Detecting process death for anycast named process monitoring

2007-05-03 Thread David M. Lloyd

On Wed, 2007-05-02 at 16:30 -0600, Chris Friesen wrote:
> Glen Turner wrote:
> 
> > The question is, how can a process with no relationship to another
> > process detect that process unexpectedly dying?  If named goes
> > away to a better place, we want to shut down the interface
> > which causes Quagga to inject the anycast route.

> We did something similar where arbitrary processes can register to be 
> sent an arbitrary signal when the state of other processes change.

What about something like inotify, but for processes?  That would be
cool...

- DML

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: Regression with SLUB on Netperf and Volanomark

2007-05-03 Thread Christoph Lameter

H.. One potential issues are the complicated way the slab is 
handled. Could you try this patch and see what impact it has?

If it has any then remove the cachline alignment and see how that 
influences things.


Remove constructor from buffer_head

Buffer head management uses a constructor which increases overhead
for object handling. Remove the constructor. That way SLUB can place
the freepointer in an optimal location instead of after the object
in potentially another cache line.

Also having no constructor makes allocation and disposal of slabs
from the page allocator much easier since no pass over the objects
allocated to call construtors is necessary. SLUB can directly begin by
serving the first object.

Plus it simplifies the code and removes a difficult to understand
element for buffer handling.

Align the buffer heads on cacheline boundaries for best performance.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 fs/buffer.c |   22 --
 include/linux/buffer_head.h |2 +-
 2 files changed, 5 insertions(+), 19 deletions(-)

Index: slub/fs/buffer.c
===
--- slub.orig/fs/buffer.c   2007-04-30 22:03:21.0 -0700
+++ slub/fs/buffer.c2007-05-03 18:37:47.0 -0700
@@ -2907,9 +2907,10 @@ static void recalc_bh_state(void)

 struct buffer_head *alloc_buffer_head(gfp_t gfp_flags)
 {
-   struct buffer_head *ret = kmem_cache_alloc(bh_cachep,
+   struct buffer_head *ret = kmem_cache_zalloc(bh_cachep,
set_migrateflags(gfp_flags, __GFP_RECLAIMABLE));
if (ret) {
+   INIT_LIST_HEAD(&ret->b_assoc_buffers);
get_cpu_var(bh_accounting).nr++;
recalc_bh_state();
put_cpu_var(bh_accounting);
@@ -2928,17 +2929,6 @@ void free_buffer_head(struct buffer_head
 }
 EXPORT_SYMBOL(free_buffer_head);
 
-static void
-init_buffer_head(void *data, struct kmem_cache *cachep, unsigned long flags)
-{
-   if (flags & SLAB_CTOR_CONSTRUCTOR) {
-   struct buffer_head * bh = (struct buffer_head *)data;
-
-   memset(bh, 0, sizeof(*bh));
-   INIT_LIST_HEAD(&bh->b_assoc_buffers);
-   }
-}
-
 static void buffer_exit_cpu(int cpu)
 {
int i;
@@ -2965,12 +2955,8 @@ void __init buffer_init(void)
 {
int nrpages;
 
-   bh_cachep = kmem_cache_create("buffer_head",
-   sizeof(struct buffer_head), 0,
-   (SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|
-   SLAB_MEM_SPREAD),
-   init_buffer_head,
-   NULL);
+   bh_cachep = KMEM_CACHE(buffer_head,
+   SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD);
 
/*
 * Limit the bh occupancy to 10% of ZONE_NORMAL
Index: slub/include/linux/buffer_head.h
===
--- slub.orig/include/linux/buffer_head.h   2007-05-03 18:40:51.0 
-0700
+++ slub/include/linux/buffer_head.h2007-05-03 18:41:07.0 -0700
@@ -73,7 +73,7 @@ struct buffer_head {
struct address_space *b_assoc_map;  /* mapping this buffer is
   associated with */
atomic_t b_count;   /* users using this buffer_head */
-};
+} cacheline_aligned_in_smp;
 
 /*
  * macro tricks to expand the set_buffer_foo(), clear_buffer_foo()
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [v4l-dvb-maintainer] [PATCH 35/36] Use menuconfig objects II - DVB

2007-05-03 Thread Trent Piepho

On Fri, 4 May 2007, Roman Zippel wrote:
> I don't quite understand. With the menuconfig changes more menu entries
> should  appear on the left side, so I don't understand why you have to
> "drill down" to reach it.
> The rule for menu to appear on the left side is relatively simple - all
> its parents must be of menu type as well. So if a menuconfig is on the
> right side it must have a normal config entry as parent.

I think that's it.  The media tree was done with options to select the core
system module, then a menuconfig that depended on that which the drivers
were under.

> > > menuconfig 
> > > if 
> > > [all the other options]
> > > endif
> > >
> > > Into this:
> > >
> > > menuconfig 
> > > [all the other options]
> > > endmenu
> > >
> > > The reason is that a frontend would easily be able to understand the 
> > > coupling
> > > between the "menuconfig " and the "if ".  It will make it easier 
> > > for
> > > the frontend to see that all the options are inside and controlled by the
> > > enclosing menuconfig.
>
> If the frontend wants to change the behaviour of a menuconfig, it can
> already do that, so this doesn't require a syntax change.

How about these examples:

menuconfig FOO
if FOO
config A
depends on FOO
endif
config B
if FOO
config C
depends on FOO
endif

Or this:
menu FOO
menuconfig BAR
config A
menuconfig BAZ
config B
endmenu

How does it show the first one, keeping the config entries in the correct
order and put them into the menu at the same time?

And which of what should the second be show?

foo
\-bar
  \-baz

or

foo
|-bar
\-baz

There is no question with menus, as the menu tree is clearly lexically
defined by the matching menu / endmenu pairs.  But menuconfig doesn't work
that way, and it seems like it would make more sense if it did.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree

2007-05-03 Thread Paul Jackson

Adding Christoph Lameter <[EMAIL PROTECTED]> to the cc list, as he knows
more about hugetlb pages than I do.

This patch strikes me as a bit odd.

Granted, it's solving what could be a touchy problem with a fairly
simple solution, which is usually a Good Thing(tm).

However, the idea that different tasks would see different values for
the following fields in /proc/meminfo:

HugePages_Total: 0
HugePages_Free:  0

strikes me as odd, and risky.  I would have thought that usually, all
tasks in the system should see the same values in the files in /proc
(as opposed to the files in particular task subdirectories /proc/.)

This patch strikes me as a bit of a hack, good for compatibility, but
hiding a booby trap that will bite some user code in the long run.

But I'm not enough of an expert to know what the right tradeoffs are
in this matter.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-dvb] DST/BT878 module customization (.. was: Critical points about ...)

2007-05-03 Thread Uwe Bugla

 Original-Nachricht 
Datum: Fri, 04 May 2007 02:31:49 +0400
Von: Manu Abraham <[EMAIL PROTECTED]>
An: [EMAIL PROTECTED]
CC: linux-kernel@vger.kernel.org
Betreff: Re: [linux-dvb] DST/BT878 module customization (.. was: Critical   
points about ...)

> Markus Rechberger wrote:
> 
> > I mean the mail from Helge Hafting (thread  [linux-dvb] Critical
> > points about kernel 2.6.21 and pseudo-authorities) at the very first
> > beginning.
> > 
> 
> I am replying to this mail, just because someone's spreading lies all
> around.
> On the mentioned thread, what i wrote (and that was the only mail from
> my side):
> 
> There is a saying: "He who lives by the sword, dies by the sword."

Hi Manu,

The saying that you stated is a very christian one.
I perhaps should state that I am 47 years old now, raised in in utmost 
reactionary region called Bavaria (Western Germany), and also raised by parents 
of Russian / Polonian origin who shared the Nazi regime with the usual 
"I-do-not-want-to-talk-about-it-and-I-do-not-want-to-feel-responsible-about-it  

"-behaviour.
And I am very much not only interested in german post-war history, but I simply 
love to write provocative letters or mails to make my conviction utmost clear 
that all this capitalist bullshit around us should vanish and shrink and be 
overcome some day.

Basic christian ideals are very close to basic marxist ideas.
The one who never does perceive that is a real poor human being in my eyes, if 
not to say: a complete idiot or a system-conforming hypocrite.

BUT:
I in fact do not read this "saying" for the first time:

In my personal experience (feel very sorry about it, but it's true) it has 
always truthfully been an excuse for persons being strongly limited on what I 
would call utmost primitive instincts like greed or rapacity (i. e. the utmost 
perfect sounding "would-like-to-capitalists", if not to say: the perfect slaves 
or: the perfect counterrevolutionaries or strike-breakers, if not to say: the 
utmost perfect asscreepers).

Please forgive me for that statement, but I am simply stating my personal 
experiences very truthfully, without playing any politics, but just telling you 
my "personal truth" or the sum of all my personal life experience unfortunately 
bound to that.

And if there is discussion needed on that we should do it private or anyway on 
some other thread, but definitely not on this one.

Hints to help you to understand the difference:

1. There is a GPL license written by Richard Stallman whose origin I do not 
know:
Its essence is the philosophy to share and to be highly transparent as far as 
information level is concerned.

2. There is a saying by Linus in which he states the best choice he ever did 
was conforming his work to the terms of Richard Stallman, the GPL.

3. Wikipedia says that Linus's father was no christian at all, but simply a 
communist.

See, Manu, there are deeply primitive instinct-driven hypocrites around like 
hell, but there are also truthful human beings around.

But:
The Internet does not provide a platform to find out who is who and what is 
what.
The Internet may be necessary, but in the end it's just a drag, isn't it?

Sincerely
Uwe
> 
> 
>  Original Message 
> Subject: Re: [linux-dvb] Re: Critical points about kernel 2.6.21  and
> pseudo-authorities
> Date: Tue, 01 May 2007 04:19:41 +0400
> From: Manu Abraham <[EMAIL PROTECTED]>
> To: Uwe Bugla <[EMAIL PROTECTED]>
> CC: [EMAIL PROTECTED],  [EMAIL PROTECTED],
> linux-kernel@vger.kernel.org,  [EMAIL PROTECTED],
> [EMAIL PROTECTED],  [EMAIL PROTECTED]
> References: <[EMAIL PROTECTED]>
> <[EMAIL PROTECTED]>
> <[EMAIL PROTECTED]>
> <[EMAIL PROTECTED]>
> <[EMAIL PROTECTED]>   <[EMAIL PROTECTED]
> <[EMAIL PROTECTED]>
> <[EMAIL PROTECTED]>
> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
> 
> Uwe Bugla wrote:
> 
> > 1. You utmost personally are responsible for 4 ununsable kernels, as
> far as bt8xx cards are concerned: 2.6.13, 2.6.14, 2.6.15, 2.6.16!
> > 2. You did not even want to imply to resolve that issue by incarnating
> that "community and synergy principle" that linux community needs to
> exist at all, but you just perverted it by flaming capable people -
> 
> You mean like this:
> 
> 
>  Original Message 
> Subject: kernel patch practice in 2.6.13-mm2
> Date: Tue, 13 Sep 2005 16:46:35 +0200 (MEST)
> From: Uwe Bugla <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> CC: [EMAIL PROTECTED]
> 
> Hi,
> if you continue to send or sign mm-patches for Kernel 2.6.13 as a
> consequence of a design change I would appreciate you to stop rubbing out
> my
> name.
> You did that in a file called /Documentation/dvb/bt8xx.txt.
> My objective is understandable good documentation, even if it may

Re: [patch] compiler: introduce used and maybe_unused

2007-05-03 Thread David Rientjes

__used is defined to be __attribute__((unused)) for all pre-3.4 gcc
compilers to suppress warnings for unused functions because perhaps they
are referenced only in inline assembly.  It is defined to be 
__attribute__((used)) for gcc 3.4 and later so that the code is still
emitted for such functions.

__maybe_unused is defined to be __attribute__((unused)) for both function
and variable use if it could possibly be unreferenced due to the
evaluation of preprocessor macros.  Function prototypes shall be marked
with __maybe_unused if the actual definition of the function is dependant
on preprocessor macros.

No update to compiler-intel.h is necessary because ICC supports both
__attribute__((used)) and __attribute__((unused)) as specified by the
gcc manual.

__attribute_used__ is deprecated and will be removed once all current
code is converted to using __used.

Cc: Rusty Russell <[EMAIL PROTECTED]>
Cc: Andrian Bunk <[EMAIL PROTECTED]>
Signed-off-by: David Rientjes <[EMAIL PROTECTED]>
---
 include/linux/compiler-gcc.h  |1 +
 include/linux/compiler-gcc3.h |6 --
 include/linux/compiler-gcc4.h |3 ++-
 include/linux/compiler.h  |   21 ++---
 4 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -37,3 +37,4 @@
 #define  noinline  __attribute__((noinline))
 #define __attribute_pure__ __attribute__((pure))
 #define __attribute_const____attribute__((__const__))
+#define __maybe_unused __attribute__((unused))
diff --git a/include/linux/compiler-gcc3.h b/include/linux/compiler-gcc3.h
--- a/include/linux/compiler-gcc3.h
+++ b/include/linux/compiler-gcc3.h
@@ -4,9 +4,11 @@
 #include 
 
 #if __GNUC_MINOR__ >= 3
-# define __attribute_used____attribute__((__used__))
+# define __used__attribute__((__used__))
+# define __attribute_used____used  /* deprecated */
 #else
-# define __attribute_used____attribute__((__unused__))
+# define __used__attribute__((__unused__))
+# define __attribute_used____used  /* deprecated */
 #endif
 
 #if __GNUC_MINOR__ >= 4
diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -12,7 +12,8 @@
 # define __inline  __inline__attribute__((always_inline))
 #endif
 
-#define __attribute_used__ __attribute__((__used__))
+#define __used __attribute__((__used__))
+#define __attribute_used__ __used  /* deprecated */
 #define __must_check   __attribute__((warn_unused_result))
 #define __compiler_offsetof(a,b) __builtin_offsetof(a,b)
 #define __always_inlineinline __attribute__((always_inline))
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -108,15 +108,30 @@ extern void __chk_io_ptr(const void __iomem *);
  * Allow us to avoid 'defined but not used' warnings on functions and data,
  * as well as force them to be emitted to the assembly file.
  *
- * As of gcc 3.3, static functions that are not marked with attribute((used))
- * may be elided from the assembly file.  As of gcc 3.3, static data not so
+ * As of gcc 3.4, static functions that are not marked with attribute((used))
+ * may be elided from the assembly file.  As of gcc 3.4, static data not so
  * marked will not be elided, but this may change in a future gcc version.
  *
+ * NOTE: Because distributions shipped with a backported unit-at-a-time
+ * compiler in gcc 3.3, we must define __used to be __attribute__((used))
+ * for gcc >=3.3 instead of 3.4.
+ *
  * In prior versions of gcc, such functions and data would be emitted, but
  * would be warned about except with attribute((unused)).
+ *
+ * Mark functions that are referenced only in inline assembly as __used so
+ * the code is emitted even though it appears to be unreferenced.
  */
 #ifndef __attribute_used__
-# define __attribute_used__/* unimplemented */
+# define __attribute_used__/* deprecated */
+#endif
+
+#ifndef __used
+# define __used/* unimplemented */
+#endif
+
+#ifndef __maybe_unused
+# define __maybe_unused/* unimplemented */
 #endif
 
 /*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] make cancel_rearming_delayed_work() reliable

2007-05-03 Thread Andrew Morton

On Fri, 4 May 2007 00:42:26 +0400
Oleg Nesterov <[EMAIL PROTECTED]> wrote:

> Thanks to Jarek Poplawski for the ideas and for spotting the bug in the
> initial draft patch.
> 
> cancel_rearming_delayed_work() currently has many limitations, because it
> requires that dwork always re-arms itself via queue_delayed_work(). So it
> hangs forever if dwork doesn't do this, or cancel_rearming_delayed_work/
> cancel_delayed_work was already called. It uses flush_workqueue() in a loop,
> so it can't be used if workqueue was freezed, and it is potentially live-
> lockable on busy system if delay is small.
> 
> With this patch cancel_rearming_delayed_work() doesn't make any assumptions
> about dwork, it can re-arm itself via queue_delayed_work(), or queue_work(),
> or do nothing.
> 
> As a "side effect", cancel_work_sync() was changed to handle re-arming works
> as well.
> 
> Disadvantages:
> 
>   - this patch adds wmb() to insert_work().
> 
>   - slowdowns the fast path (when del_timer() succeeds on entry) of
> cancel_rearming_delayed_work(), because wait_on_work() is called
> unconditionally. In that case, compared to the old version, we are
> doing "unneeded" lock/unlock for each online CPU.
> 
> On the other hand, this means we don't need to use cancel_work_sync()
> after cancel_rearming_delayed_work().
> 
>   - complicates the code (.text grows by 130 bytes).
> 

hm, this is getting complex.

> + while (!try_to_grab_pending(work))
> + ;

The patch adds a couple of spinloops.  Normally we put a cpu_relax() into
such loops.  It can make a very large difference under some circumstances.


> + while (!del_timer(&dwork->timer) &&
> +!try_to_grab_pending(&dwork->work))
> + ;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22 -mm merge plans: slub on PowerPC

2007-05-03 Thread Christoph Lameter

On Fri, 4 May 2007, Benjamin Herrenschmidt wrote:

> > The SLUB allocator relies on struct page fields first_page and slab,
> > overwritten by ptl when SPLIT_PTLOCK: so the SLUB allocator cannot then
> > be used for the lowest level of pagetable pages.  This was obstructing
> > SLUB on PowerPC, which uses kmem_caches for its pagetables.  So convert
> > its pte level to use quicklist pages (whereas pmd, pud and 64k-page pgd
> > want partpages, so continue to use kmem_caches for pmd, pud and pgd).
> > But to keep up appearances for pgtable_free, we still need PTE_CACHE_NUM.
> 
> Interesting... I'll have a look asap.

I would also recommend looking at removing the constructors for the 
remaining slabs. A constructor requires that SLUB never touch the object 
(same situation as is resulting from enabling debugging). So it must 
increase the object size in order to put the free pointer after the 
object. In case of a order of 2 cache this has a particularly bad effect 
of doubling object size. If the objects can be overwritten on free (no 
constructor) then we can use the first word of the object as a freepointer 
on kfree. Meaning we can use a hot cacheline so no cache miss. On 
alloc we have already touched the first cacheline which also avoids a 
cacheline fetch there. This is the optimal way of operation for SLUB.

Hmmm We could add an option to allow the use of a constructor while
keeping the free pointer at the beginning of the object? Then we would 
have to zap the first word on alloc. Would work like quicklists.

Add SLAB_FREEPOINTER_MAY_OVERLAP?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] synclink_gt add compat_ioctl

2007-05-03 Thread Andrew Morton

On Thu, 03 May 2007 13:01:17 -0500
Paul Fulghum <[EMAIL PROTECTED]> wrote:

> Add compat_ioctl handler to synclink_gt driver.
> 
> The one case requiring a separate 32 bit handler could be
> removed by redefining the associated structure in
> a way compatible with both 32 and 64 bit systems. But that
> approach would break existing native 64 bit user applications.


A made a few changes here...


From: Andrew Morton <[EMAIL PROTECTED]>

- Fix i386 build:

In file included from drivers/char/synclink_gt.c:85:
include/linux/synclink.h:175: error: expected specifier-qualifier-list before 
'compat_ulong_t'

- We might as well do the same ifdef-avoidery trick around compat_ioctl()
  too.  That required that it be renamed.

- It is fishy that apart from one outlier in kexec.h, synclink.h is the
  only header file which uses compat_ulong_t.  Are we doing this right?

Cc: Alan Cox <[EMAIL PROTECTED]>
Cc: Arnd Bergmann <[EMAIL PROTECTED]>
Cc: Paul Fulghum <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 drivers/char/synclink_gt.c |   16 +---
 include/linux/synclink.h   |5 +++--
 2 files changed, 12 insertions(+), 9 deletions(-)

diff -puN drivers/char/synclink_gt.c~synclink_gt-add-compat_ioctl-fix 
drivers/char/synclink_gt.c
--- a/drivers/char/synclink_gt.c~synclink_gt-add-compat_ioctl-fix
+++ a/drivers/char/synclink_gt.c
@@ -1176,15 +1176,16 @@ static int ioctl(struct tty_struct *tty,
 }
 
 #ifdef CONFIG_COMPAT
-static long compat_ioctl(struct tty_struct *tty, struct file *file,
+static long synclink_compat_ioctl(struct tty_struct *tty, struct file *file,
 unsigned int cmd, unsigned long arg)
 {
struct slgt_info *info = tty->driver_data;
int rc = -ENOIOCTLCMD;
 
-   if (sanity_check(info, tty->name, "compat_ioctl"))
+   if (sanity_check(info, tty->name, "synclink_compat_ioctl"))
return -ENODEV;
-   DBGINFO(("%s compat_ioctl() cmd=%08X\n", info->device_name, cmd));
+   DBGINFO(("%s synclink_compat_ioctl() cmd=%08X\n",
+   info->device_name, cmd));
 
switch (cmd) {
 
@@ -1219,9 +1220,12 @@ static long compat_ioctl(struct tty_stru
break;
}
 
-   DBGINFO(("%s compat_ioctl() cmd=%08X rc=%d\n", info->device_name, cmd, 
rc));
+   DBGINFO(("%s synclink_compat_ioctl() cmd=%08X rc=%d\n",
+   info->device_name, cmd, rc));
return rc;
 }
+#else
+#define synclink_compat_ioctl NULL
 #endif
 
 /*
@@ -3554,9 +3558,7 @@ static const struct tty_operations ops =
.chars_in_buffer = chars_in_buffer,
.flush_buffer = flush_buffer,
.ioctl = ioctl,
-#ifdef CONFIG_COMPAT
-   .compat_ioctl = compat_ioctl,
-#endif
+   .compat_ioctl = synclink_compat_ioctl,
.throttle = throttle,
.unthrottle = unthrottle,
.send_xchar = send_xchar,
diff -puN include/linux/synclink.h~synclink_gt-add-compat_ioctl-fix 
include/linux/synclink.h
--- a/include/linux/synclink.h~synclink_gt-add-compat_ioctl-fix
+++ a/include/linux/synclink.h
@@ -169,9 +169,9 @@ typedef struct _MGSL_PARAMS
 
 } MGSL_PARAMS, *PMGSL_PARAMS;
 
+#ifdef CONFIG_COMPAT
 /* provide 32 bit ioctl compatibility on 64 bit systems */
-struct MGSL_PARAMS32
-{
+struct MGSL_PARAMS32 {
compat_ulong_t  mode;
unsigned char   loopback;
unsigned short  flags;
@@ -186,6 +186,7 @@ struct MGSL_PARAMS32
unsigned char   stop_bits;
unsigned char   parity;
 };
+#endif
 
 #define MICROGATE_VENDOR_ID 0x13c0
 #define SYNCLINK_DEVICE_ID 0x0010
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: Regression with SLUB on Netperf and Volanomark

2007-05-03 Thread Christoph Lameter

On Thu, 3 May 2007, Chen, Tim C wrote:

> We are still seeing a 5% regression on TCP streaming with
> slub_min_objects set at 16 and a 10% regression for Volanomark, after
> increasing slub_min_objects to 16 and setting slub_max_order=4 and using
> the 2.6.21-rc7-mm2 kernel.  The performance between slub_min_objects=8
> and 16 are similar.

Ok. We then need to look at partial list management. It could be that the 
sequence of partials is reversed. The problem is that I do not really 
have time to concentrate on performance right now. Stability comes 
first. We will likely end up putting some probes in there to find out 
where the overhead comes from.

> > Check slabinfo output for the network slabs and see what order is
> > used. The number of objects per slab is important for performance.
> 
> The order used is 0 for the buffer_head, which is the most used object.
> 
> I think they are 104 bytes per object.

Hmmm Then it was not affected by slab_max_order? Try 
slab_min_order=1 or 2 to increase that?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] tty add compat_ioctl

2007-05-03 Thread Paul Fulghum


Arnd Bergmann wrote:

- The return value of the new compat_ioctl methods should probably
'int', not 'long'. We've had the discussion before and then
decided not to change the existing compat_ioctl and 
unlocked_ioctl functions -- even though int is more appropriate,

but having the same prototype has the advantage that a driver
can use the same function for both ->ioctl and ->compat_ioctl
if all calls are compatible.


I noticed that but thought the change in return value type
had some higher purpose I had not perceived. If it can be int
that would be the way to go.

- In your driver you don't get the big kernel lock in the 
compat_ioctl function. I assume that this is correct for

the particular driver, but it may be nice if you could
consequently also add an unlocked_ioctl function that can
be used without the BKL for native ioctls. It would be good
to hear an opinon on this from someone who has an insight
in tty locking issues though, so I'm Cc:ing some people
who have touched that recently.


I don't count on higher level locking for
synchronization issues specific to the driver.

I thought the current compat_ioctl() was already
meant to *not* have the BKL just like unlocked_ioctl.
My thought was that any driver getting a recent update
like compat_ioctl() would need to be reviewed for BKL
safety and take the lock manually if necessary.

Drivers that are falling behind wont have a compat_ioctl
defined at all.

--
Paul

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] compiler: introduce used and maybe_unused

2007-05-03 Thread Adrian Bunk

On Thu, May 03, 2007 at 05:35:57PM -0700, David Rientjes wrote:
>...
> There was a mistake in the current implementation of __attribute_used__
> whereas it would be defined to be __attribute__((used)) incorrectly for
> gcc 3.3 and later.  The unit-at-a-time compilation scheme was only
> introduced in gcc 3.4 and later versions as specified in 
> http://www.gnu.org/software/gcc/gcc-3.4/changes.html.
>...

AFAIR, Suse shipped a release of their distribution with a gcc 3.3 
containing a backported unit-at-a-time.

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] lib/hexdump

2007-05-03 Thread Randy Dunlap


> > Ho hum.  Perhaps a middle ground is to implement hexdump-to-memory as the
> > core function.  hex_dumper() becomes a simple wrapper around that.  (but
> > how big is its buffer?  One line would be OK, I guess)
> 
> Yeah, I almost did it that way.  We'll see.
> 
> > > OK, that's one way to do it.  I'll wait a bit for other comments.
> > 
> > Good luck ;)

next try:


From: Randy Dunlap <[EMAIL PROTECTED]>

Based on ace_dump_mem() from Grant Likely for the Xilinx 
SystemACE CompactFlash interface.

Add print_hex_dump() & hex_dumper() to lib/hexdump.c and linux/kernel.h.

This patch adds the functions print_hex_dump() & hex_dumper().
print_hex_dump() can be used to perform a hex + ASCII dump of data to syslog,
in an easily viewable format, thus providing a common text hex dump format.

hex_dumper() provides a dump-to-memory function.  It converts one "line"
of output (16 bytes of input) at a time.

Example usages:
print_hex_dump(KERN_DEBUG, DUMP_PREFIX_ADDRESS, frame->data, 
frame->len);
hex_dumper(frame->data, frame->len, linebuf, sizeof(linebuf));

Example output using %DUMP_PREFIX_OFFSET:
0009ab42: 40414243 44454647 48494a4b [EMAIL PROTECTED] HIJKLMNO
Example output using %DUMP_PREFIX_ADDRESS:
88089af0: 70717273 74757677 78797a7b 7c7d7e7f-pqrstuvw xyz{|}~.

Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>
---

 include/linux/kernel.h |   10 
 lib/Makefile   |2 
 lib/hexdump.c  |  105 +
 3 files changed, 116 insertions(+), 1 deletion(-)

--- linux-2.6.21-git4.orig/include/linux/kernel.h
+++ linux-2.6.21-git4/include/linux/kernel.h
@@ -202,6 +202,16 @@ extern enum system_states {
 
 extern void dump_stack(void);
 
+enum {
+   DUMP_PREFIX_NONE,
+   DUMP_PREFIX_ADDRESS,
+   DUMP_PREFIX_OFFSET
+};
+extern void hex_dumper(void *buf, size_t len, char *linebuf, size_t 
linebuflen);
+extern void print_hex_dump(const char *level, int prefix_type,
+   void *buf, size_t len);
+#define hextoasc(x)"0123456789abcdef"[x]
+
 #ifdef DEBUG
 /* If you are writing a driver, please use dev_dbg instead */
 #define pr_debug(fmt,arg...) \
--- linux-2.6.21-git4.orig/lib/Makefile
+++ linux-2.6.21-git4/lib/Makefile
@@ -13,7 +13,7 @@ lib-$(CONFIG_SMP) += cpumask.o
 lib-y  += kobject.o kref.o kobject_uevent.o klist.o
 
 obj-y += div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
-bust_spinlocks.o
+bust_spinlocks.o hexdump.o
 
 ifeq ($(CONFIG_DEBUG_KOBJECT),y)
 CFLAGS_kobject.o += -DDEBUG
--- /dev/null
+++ linux-2.6.21-git4/lib/hexdump.c
@@ -0,0 +1,105 @@
+/*
+ * lib/hexdump.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation. See README and COPYING for
+ * more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * hex_dumper - convert a blob of data to "hex ASCII" in memory
+ * @buf: data blob to dump
+ * @len: number of bytes in the @buf
+ * @linebuf: where to put the converted data
+ * @linebuflen: total size of @linebuf, including space for terminating NUL
+ *
+ * hex_dumper() works on one "line" of output at a time, i.e.,
+ * 16 bytes of input data converted to hex + ASCII output.
+ *
+ * Given a buffer of u8 data, hex_dumper() converts the input data to a
+ * hex + ASCII dump at the supplied memory location.
+ * The converted output is always NUL-terminated.
+ *
+ * E.g.:
+ * hex_dumper(frame->data, frame->len, linebuf, sizeof(linebuf));
+ *
+ * Prints the offsets of the block of memory, not addresses:
+ * 0009ab42: 40414243 44454647 48494a4b [EMAIL PROTECTED] HIJKLMNO
+ */
+void hex_dumper(void *buf, size_t len, char *linebuf, size_t linebuflen)
+{
+   const u8 *ptr = buf;
+   u8 ch;
+   int j, lx = 0;
+
+   for (j = 0; (j < 16) && (j < len) && (lx + 3) < linebuflen; j++) {
+   if (j && !(j % 4))
+   linebuf[lx++] = ' ';
+   ch = ptr[j];
+   linebuf[lx++] = hextoasc(ch >> 4);
+   linebuf[lx++] = hextoasc(ch & 0x0f);
+   }
+   if (lx < linebuflen)
+   linebuf[lx++] = '-';
+   for (j = 0; (j < 16) && (j < len) && (lx + 2) < linebuflen; j++) {
+   linebuf[lx++] = isprint(ptr[j]) ?  ptr[j] : '.';
+   if (j == 7)
+   linebuf[lx++] = ' ';
+   }
+   linebuf[lx++] = '\0';
+}
+EXPORT_SYMBOL(hex_dumper);
+
+/**
+ * print_hex_dump - print a text hex dump to syslog for a binary blob of data
+ * @level: kernel log level (e.g. KERN_DEBUG)
+ * @prefix_type: controls whether prefix of an offset, address, or none
+ *  is printed (%DUMP_PREFIX_OFFSET, %DUMP_PREFIX_ADDRESS, %DUMP_PREFIX_NONE)
+ * @buf: data blob to dump
+ * @len: number of bytes in the @buf
+ *
+ * Given a buffer of u8 data, print_hex_dump() prints a hex + ASCII dump
+ * to the ker

Re: [linux-dvb] DST/BT878 module customization (.. was: Critical points about ...)

2007-05-03 Thread hermann pitton

Am Freitag, den 04.05.2007, 02:31 +0400 schrieb Manu Abraham:
> Markus Rechberger wrote:
> 
> > I mean the mail from Helge Hafting (thread  [linux-dvb] Critical
> > points about kernel 2.6.21 and pseudo-authorities) at the very first
> > beginning.
> > 
> 
> I am replying to this mail, just because someone's spreading lies all
> around.
> On the mentioned thread, what i wrote (and that was the only mail from
> my side):
> 
> There is a saying: "He who lives by the sword, dies by the sword."
> 

Within the last six years there was in the end exactly one, never asked
for, private mail with worst *bullshit* about another person, Mauro in
this case.

It came from you, out of any feasible arguments for me anymore.

I'm stupid, but not stupid enough to allow such stuff coming in rule.

But I still say you have been first and are waiting longest to get your
work in, please try again to get your ACKs and rant about not enough
replies.

Cheers,
Hermann

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RELEASE] Lguest for 2.6.21

2007-05-03 Thread Rusty Russell

On Fri, 2007-05-04 at 10:13 +1000, Rusty Russell wrote:
> On Thu, 2007-05-03 at 11:02 -0500, Matt Mackall wrote:
> > On Thu, May 03, 2007 at 12:43:48AM +1000, Rusty Russell wrote:
> > >   http://lguest.ozlabs.org/lguest-2.6.21-254.patch.gz
> > > 
> > > See Documentation/lguest/lguest.txt for how to run,
> > > drivers/lguest/README for the draft code documentation journey.
> > 
> > Your lguest readme is quite lacking in the area of how to configure a
> > guest kernel as opposed to the host kernel. More hand-holding, please.
> 
> Hi Matt!
> 
>   Ah, that's because they are the same kernel.  Turning on CONFIG_LGUEST
> builds-in the parts needed to be a guest as well.
> 
> Thanks for pointing out that weakness.  I will modify lguest.txt to make
> that clear.

Something like this:

diff -r 940ec1c6ac5a Documentation/lguest/lguest.txt
--- a/Documentation/lguest/lguest.txt   Thu May 03 23:00:19 2007 +1000
+++ b/Documentation/lguest/lguest.txt   Fri May 04 10:17:23 2007 +1000
@@ -23,7 +23,10 @@ Developer features:
 
 Running Lguest:
 
-- You will need to configure your kernel with the following options:
+- Lguest runs the same kernel as guest and host.  You can configure
+  them differently, but usually it's easiest not to.
+
+  You will need to configure your kernel with the following options:
 
   CONFIG_HIGHMEM64G=n ("High Memory Support" "64GB")[1]
   CONFIG_TUN=y/m ("Universal TUN/TAP device driver support")

Cheers,
Rusty.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: console font limits

2007-05-03 Thread H. Peter Anvin

Kyle Moffett wrote:
> 
> Actually I think the real problem was that "KD_GRAPHICS" got overloaded
> to mean "some userspace program is probably poking at the GPU in very
> direct ways possibly including /dev/mem".  As such it really isn't safe
> at all for the kernel to write stuff to the screen in that situation;
> you could turn a panic()+reboot-after-30-secs into an unrecoverable hard
> PCI bus lockup.  IIRC there were at least a couple chipsets which had
> that problem with X.  If we can implement enough APIs for X to do all of
> its stuff from userspace without iopl() or /dev/mem then we could
> probably bring back the option for dumping oopses to screen in
> KD_GRAPHICS mode, but otherwise it'll just cause more headaches.
> 

It never meant anything *BUT* that, to the best of my knowledge.  That
was certainly the original meaning of KD_GRAPHICS.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] compiler: introduce used and maybe_unused

2007-05-03 Thread David Rientjes

__used is defined to be __attribute__((unused)) for all pre-3.4 gcc
compilers to suppress warnings for unused functions because perhaps they
are referenced only in inline assembly.  It is defined to be 
__attribute__((used)) for gcc 3.4 and later so that the code is still
emitted for such functions.

There was a mistake in the current implementation of __attribute_used__
whereas it would be defined to be __attribute__((used)) incorrectly for
gcc 3.3 and later.  The unit-at-a-time compilation scheme was only
introduced in gcc 3.4 and later versions as specified in 
http://www.gnu.org/software/gcc/gcc-3.4/changes.html.

__maybe_unused is defined to be __attribute__((unused)) for both function
and variable use if it could possibly be unreferenced due to the
evaluation of preprocessor macros.  Function prototypes shall be marked
with __maybe_unused if the actual definition of the function is dependant
on preprocessor macros.

No update to compiler-intel.h is necessary because ICC supports both
__attribute__((used)) and __attribute__((unused)) as specified by the
gcc manual.

__attribute_used__ is deprecated and will be removed once all current
code is converted to using __used.

Cc: Rusty Russell <[EMAIL PROTECTED]>
Cc: Andrian Bunk <[EMAIL PROTECTED]>
Signed-off-by: David Rientjes <[EMAIL PROTECTED]>
---
 include/linux/compiler-gcc.h  |1 +
 include/linux/compiler-gcc3.h |   13 ++---
 include/linux/compiler-gcc4.h |3 ++-
 include/linux/compiler.h  |   17 ++---
 4 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -37,3 +37,4 @@
 #define  noinline  __attribute__((noinline))
 #define __attribute_pure__ __attribute__((pure))
 #define __attribute_const____attribute__((__const__))
+#define __maybe_unused __attribute__((unused))
diff --git a/include/linux/compiler-gcc3.h b/include/linux/compiler-gcc3.h
--- a/include/linux/compiler-gcc3.h
+++ b/include/linux/compiler-gcc3.h
@@ -3,14 +3,13 @@
 /* These definitions are for GCC v3.x.  */
 #include 
 
-#if __GNUC_MINOR__ >= 3
-# define __attribute_used____attribute__((__used__))
-#else
-# define __attribute_used____attribute__((__unused__))
-#endif
-
 #if __GNUC_MINOR__ >= 4
-#define __must_check   __attribute__((warn_unused_result))
+# define __used__attribute__((__used__))
+# define __attribute_used____used  /* deprecated */
+# define __must_check  __attribute__((warn_unused_result))
+#else
+# define __used__attribute__((__unused__))
+# define __attribute_used____used  /* deprecated */
 #endif
 
 #define __always_inlineinline __attribute__((always_inline))
diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -12,7 +12,8 @@
 # define __inline  __inline__attribute__((always_inline))
 #endif
 
-#define __attribute_used__ __attribute__((__used__))
+#define __used __attribute__((__used__))
+#define __attribute_used__ __used  /* deprecated */
 #define __must_check   __attribute__((warn_unused_result))
 #define __compiler_offsetof(a,b) __builtin_offsetof(a,b)
 #define __always_inlineinline __attribute__((always_inline))
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -108,15 +108,26 @@ extern void __chk_io_ptr(const void __iomem *);
  * Allow us to avoid 'defined but not used' warnings on functions and data,
  * as well as force them to be emitted to the assembly file.
  *
- * As of gcc 3.3, static functions that are not marked with attribute((used))
- * may be elided from the assembly file.  As of gcc 3.3, static data not so
+ * As of gcc 3.4, static functions that are not marked with attribute((used))
+ * may be elided from the assembly file.  As of gcc 3.4, static data not so
  * marked will not be elided, but this may change in a future gcc version.
  *
  * In prior versions of gcc, such functions and data would be emitted, but
  * would be warned about except with attribute((unused)).
+ *
+ * Mark functions that are referenced only in inline assembly as __used so
+ * the code is emitted even though it appears to be unreferenced.
  */
 #ifndef __attribute_used__
-# define __attribute_used__/* unimplemented */
+# define __attribute_used__/* deprecated */
+#endif
+
+#ifndef __used
+# define __used/* unimplemented */
+#endif
+
+#ifndef __maybe_unused
+# define __maybe_unused/* unimplemented */
 #endif
 
 /*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a

Re: [v4l-dvb-maintainer] [PATCH 35/36] Use menuconfig objects II - DVB

2007-05-03 Thread Roman Zippel

Hi,

On Thu, 3 May 2007, Sam Ravnborg wrote:

> Please include Roman Zippel when you propose kconfig changes.

Thanks, the lkml volume lately forces me to skip a lot, so it's quite 
possible I miss something. :)

> > xconfig has the menu tree display in the left panel, where one can see the
> > overall layout of the menu tree and jump directly to any menu (even one
> > multiple levels deep).  All the menuconfigs that used to be menus don't show
> > up here anymore.
> > 
> > To turn a menuconfig off, you must go to the top level menu containing the
> > menuconfig you want (and you must know which one that is!).  Then you have 
> > to
> > drill down through each menu level one by one, by finding that menu in the 
> > top
> > panel (which also has all the config options listed) and clicking on it to 
> > get
> > to the next one.  When you get to the menuconfig you want, you must enter it
> > and then you finally get the box to turn that menuconfig off.
> > 
> > It looks like your changes are going in, so I suppose the solution is to
> > improve the way xconfig handles "menuconfig".

I don't quite understand. With the menuconfig changes more menu entries 
should  appear on the left side, so I don't understand why you have to 
"drill down" to reach it.
The rule for menu to appear on the left side is relatively simple - all 
its parents must be of menu type as well. So if a menuconfig is on the 
right side it must have a normal config entry as parent.

> > I wonder, would it be possible to change the kconfig language so that:
> > menuconfig 
> > boolean "name of menu"
> > 
> > Did the same thing as:
> > config 
> > boolean "name of menu"
> > menu "name of menu"
> > depends on 
> > 
> > This way you could change this:
> > 
> > menuconfig 
> > if 
> > [all the other options]
> > endif
> > 
> > Into this:
> > 
> > menuconfig 
> > [all the other options]
> > endmenu
> > 
> > The reason is that a frontend would easily be able to understand the 
> > coupling
> > between the "menuconfig " and the "if ".  It will make it easier for
> > the frontend to see that all the options are inside and controlled by the
> > enclosing menuconfig.

If the frontend wants to change the behaviour of a menuconfig, it can 
already do that, so this doesn't require a syntax change.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22 -mm merge plans: slub on PowerPC

2007-05-03 Thread Benjamin Herrenschmidt

On Thu, 2007-05-03 at 22:04 +0100, Hugh Dickins wrote:
> On Thu, 3 May 2007, Hugh Dickins wrote:
> > 
> > Seems we're all wrong in thinking Christoph's Kconfiggery worked
> > as intended: maybe it just works some of the time.  I'm not going
> > to hazard a guess as to how to fix it up, will resume looking at
> > the powerpc's quicklist potential later.
> 
> Here's the patch I've been testing on G5, with 4k and with 64k pages,
> with SLAB and with SLUB.  But, though it doesn't crash, the pgd
> kmem_cache in the 4k-page SLUB case is revealing SLUB's propensity
> for using highorder allocations where SLAB would stick to order 0:
> under load, exec's mm_init gets page allocation failure on order 4
> - SLUB's calculate_order may need some retuning.  (I'd expect it to
> be going for order 3 actually, I'm not sure how order 4 comes about.)
> 
> I don't know how offensive Ben and Paulus may find this patch:
> the kmem_cache use was nicely done and this messes it up a little.
> 
> 
> The SLUB allocator relies on struct page fields first_page and slab,
> overwritten by ptl when SPLIT_PTLOCK: so the SLUB allocator cannot then
> be used for the lowest level of pagetable pages.  This was obstructing
> SLUB on PowerPC, which uses kmem_caches for its pagetables.  So convert
> its pte level to use quicklist pages (whereas pmd, pud and 64k-page pgd
> want partpages, so continue to use kmem_caches for pmd, pud and pgd).
> But to keep up appearances for pgtable_free, we still need PTE_CACHE_NUM.

Interesting... I'll have a look asap.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/8] remove "#if 0" from find_bus function, export it.

2007-05-03 Thread Anton Vorontsov

On Thu, May 03, 2007 at 04:14:59PM -0700, Greg KH wrote:
> On Fri, May 04, 2007 at 01:31:21AM +0400, Anton Vorontsov wrote:
> > This function were placed in "#if 0" because nobody was using it.
> > We using it now.
> 
> Why?  Shouldn't you just export the pointer you need instead?

We can do one way or another. We can ask W1 bus maintainer to
export bus type. Or we can un-"if 0" generic find_bus/bus_find
function.

> And if you really want it, and you convince me you really need it,

No, I don't want it at all. But ds2760_battery driver need to find
w1 bus type.

A long time ago in a Galaxy far far away we're used to find_bus
function, then it was removed, and somewhere in the thread I gave
link to, someone suggested to show real user of that function and
post that patch. I've just done that.

So, if you're unwilling to revert that function, please say it
explicitly, and I'll ping w1 folks to export bus type.

I really-really don't care how exactly we should find that
bus stuff.

> thanks,
> 
> greg k-h

Good luck,

-- 
Anton Vorontsov
email: [EMAIL PROTECTED]
backup email: [EMAIL PROTECTED]
irc://irc.freenode.org/bd2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: console font limits

2007-05-03 Thread Kyle Moffett


On May 03, 2007, at 16:16:51, Jan Engelhardt wrote:

On May 3 2007 13:15, H. Peter Anvin wrote:

Jan Engelhardt wrote:


Put people didn't like that, and disabled text output when the  
console is in KD_GRAPHICS mode...


at the cost of not getting the kernel oops, heh.


I thought the reason we didn't display text in KD_GRAPHICS mode  
was that KD_GRAPHICS might mean "in a completely different mode  
that only userspace knows about."


Hrm. Maybe we need a distinction into KD_KGRAPHICS and KD_UGRAPHICS  
then.


Actually I think the real problem was that "KD_GRAPHICS" got  
overloaded to mean "some userspace program is probably poking at the  
GPU in very direct ways possibly including /dev/mem".  As such it  
really isn't safe at all for the kernel to write stuff to the screen  
in that situation; you could turn a panic()+reboot-after-30-secs into  
an unrecoverable hard PCI bus lockup.  IIRC there were at least a  
couple chipsets which had that problem with X.  If we can implement  
enough APIs for X to do all of its stuff from userspace without iopl 
() or /dev/mem then we could probably bring back the option for  
dumping oopses to screen in KD_GRAPHICS mode, but otherwise it'll  
just cause more headaches.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] m68knommu: use generic irq framework

2007-05-03 Thread Greg Ungerer

Change the m68knommu irq handling to use the generic irq framework.

Signed-off-by: Greg Ungerer <[EMAIL PROTECTED]>
---

 arch/m68knommu/Kconfig |4 
 arch/m68knommu/kernel/Makefile |4 
 arch/m68knommu/kernel/asm-offsets.c|5 
 arch/m68knommu/kernel/irq.c|   82 +
 arch/m68knommu/kernel/setup.c  |6 
 arch/m68knommu/kernel/traps.c  |2 
 arch/m68knommu/platform/5307/Makefile  |2 
 arch/m68knommu/platform/5307/entry.S   |   39 +---
 arch/m68knommu/platform/5307/ints.c|  279 -
 arch/m68knommu/platform/5307/vectors.c |   29 ++-
 arch/m68knommu/platform/68328/entry.S  |   10 -
 arch/m68knommu/platform/68328/ints.c   |  130 ++-
 arch/m68knommu/platform/68360/entry.S  |6 
 arch/m68knommu/platform/68360/ints.c   |  233 +--
 include/asm-m68knommu/irq.h|   75 
 include/asm-m68knommu/irqnode.h|   36 
 include/asm-m68knommu/m68360.h |8 
 include/asm-m68knommu/machdep.h|   10 -
 include/asm-m68knommu/traps.h  |4 
 19 files changed, 171 insertions(+), 793 deletions(-)


diff -Naur linux-2.6.21/arch/m68knommu/Kconfig 
linux-2.6.21-gt/arch/m68knommu/Kconfig
--- linux-2.6.21/arch/m68knommu/Kconfig 2007-04-26 13:08:32.0 +1000
+++ linux-2.6.21-gt/arch/m68knommu/Kconfig  2007-05-04 00:20:43.0 
+1000
@@ -45,6 +45,10 @@
bool
default y
 
+config GENERIC_HARDIRQS
+   bool
+   default y
+
 config GENERIC_CALIBRATE_DELAY
bool
default y
diff -Naur linux-2.6.21/arch/m68knommu/kernel/asm-offsets.c 
linux-2.6.21-gt/arch/m68knommu/kernel/asm-offsets.c
--- linux-2.6.21/arch/m68knommu/kernel/asm-offsets.c2007-04-26 
13:08:32.0 +1000
+++ linux-2.6.21-gt/arch/m68knommu/kernel/asm-offsets.c 2007-05-04 
00:20:44.0 +1000
@@ -15,7 +15,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #define DEFINE(sym, val) \
@@ -72,10 +71,6 @@
 #else
/* bitfields are a bit difficult */
DEFINE(PT_VECTOR, offsetof(struct pt_regs, pc) + 4);
-   /* offsets into the irq_handler struct */
-   DEFINE(IRQ_HANDLER, offsetof(struct irq_node, handler));
-   DEFINE(IRQ_DEVID, offsetof(struct irq_node, dev_id));
-   DEFINE(IRQ_NEXT, offsetof(struct irq_node, next));
 #endif
 
/* offsets into the kernel_stat struct */
diff -Naur linux-2.6.21/arch/m68knommu/kernel/irq.c 
linux-2.6.21-gt/arch/m68knommu/kernel/irq.c
--- linux-2.6.21/arch/m68knommu/kernel/irq.c1970-01-01 10:00:00.0 
+1000
+++ linux-2.6.21-gt/arch/m68knommu/kernel/irq.c 2007-05-04 00:20:44.0 
+1000
@@ -0,0 +1,82 @@
+/*
+ * arch/m68knommu/kernel/irq.c
+ *
+ * (C) Copyright 2007, Greg Ungerer <[EMAIL PROTECTED]>
+ *
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file COPYING in the main directory of this archive
+ * for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+asmlinkage void do_IRQ(int irq, struct pt_regs *regs)
+{
+   struct pt_regs *oldregs = set_irq_regs(regs);
+
+   irq_enter();
+   __do_IRQ(irq);
+   irq_exit();
+
+   set_irq_regs(oldregs);
+}
+
+void ack_bad_irq(unsigned int irq)
+{
+   printk("IRQ: unexpected irq=%d\n", irq);
+}
+
+static struct irq_chip m_irq_chip = {
+   .name   = "M68K-INTC",
+   .enable = enable_vector,
+   .disable= disable_vector,
+   .ack= ack_vector,
+};
+
+void __init init_IRQ(void)
+{
+   int irq;
+
+   init_vectors();
+
+   for (irq = 0; (irq < NR_IRQS); irq++) {
+   irq_desc[irq].status = IRQ_DISABLED;
+   irq_desc[irq].action = NULL;
+   irq_desc[irq].depth = 1;
+   irq_desc[irq].chip = &m_irq_chip;
+   }
+}
+
+int show_interrupts(struct seq_file *p, void *v)
+{
+   struct irqaction *ap;
+   int irq = *((loff_t *) v);
+
+   if (irq == 0)
+   seq_puts(p, "   CPU0\n");
+
+   if (irq < NR_IRQS) {
+   ap = irq_desc[irq].action;
+   if (ap) {
+   seq_printf(p, "%3d: ", irq);
+   seq_printf(p, "%10u ", kstat_irqs(irq));
+   seq_printf(p, "%14s  ", irq_desc[irq].chip->name);
+
+   seq_printf(p, "%s", ap->name);
+   for (ap = ap->next; ap; ap = ap->next)
+   seq_printf(p, ", %s", ap->name);
+   seq_putc(p, '\n');
+   }
+   }
+
+   return 0;
+}
+
diff -Naur linux-2.6.21/arch/m68knommu/kernel/Makefile 
linux-2.6.21-gt/arch/m68knommu/kernel/Makefile
--- linux-2.6.21/arch/m68knommu/kernel/Makefile 2007-04-26 13:08:32.0 
+1000
+++ linux-2.6.21-gt/arch/m68knommu/kernel/Makefile  2007-05-04 
00:20:43.0 +10

Re: [RELEASE] Lguest for 2.6.21

2007-05-03 Thread Rusty Russell

On Thu, 2007-05-03 at 11:02 -0500, Matt Mackall wrote:
> On Thu, May 03, 2007 at 12:43:48AM +1000, Rusty Russell wrote:
> > http://lguest.ozlabs.org/lguest-2.6.21-254.patch.gz
> > 
> > See Documentation/lguest/lguest.txt for how to run,
> > drivers/lguest/README for the draft code documentation journey.
> 
> Your lguest readme is quite lacking in the area of how to configure a
> guest kernel as opposed to the host kernel. More hand-holding, please.

Hi Matt!

Ah, that's because they are the same kernel.  Turning on CONFIG_LGUEST
builds-in the parts needed to be a guest as well.

Thanks for pointing out that weakness.  I will modify lguest.txt to make
that clear.

Cheers,
Rusty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Routing 600+ vlan's via linux problems (looks like arp problems)

2007-05-03 Thread Willy Tarreau

On Fri, May 04, 2007 at 12:50:17AM +0200, Jan Engelhardt wrote:
> 
> On May 4 2007 00:23, Willy Tarreau wrote:
> >
> >> This setup will only run for about 1-2 hours while we fix the hardware
> >> router (it is running now, but only on a backup flash card solution.
> >> the harddrive in it died ;)
> >
> >Huhhh! Please tell us exactly what make and model of ROUTER you are using
> >which embeds a HARD DRIVE, so that we recall never to buy that ! Having
> >seen uptimes of 5 years on moderately big access routers, I would have
> >find it awful to see them die multiple times in that timeframe because
> >of a crappy IDE drive inside !
> 
> Haha. Would you be happy if it ran on a CF card instead? :>

Yes, because at least when you design a system to run on a CF card, you
ensure never to write on it because you know that would kill it. Then
since you never write on it, it does not wear out and has no problem
running for years (unless you bought cheap end-user CF of course). But
industrial-grade CF *is* reliable for such usages. People having problems
with CF are dumb asses who install a full standard system on those
(sometimes even with swap) then complain it dies after one year.

A hard disk simply fails after some time even if you never use it at all.
A head flying 10 microns above a platter passing at 33 m/s obviously likes
to caress it sometimes, with a polite "oops sorry" excuse that you hear
meters away.

That's a pretty bad design to put such a SPOF in some equipment which IMHO
has no real justification for embedding one, really.

Cheers,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

ExpressCard hotswap support?

2007-05-03 Thread Chris Adams

I've got a Thinkpad Z60m with an ExpressCard slot, and I got a Belkin
F5U250 GigE ExpressCard (Marvell 88E8053 chip using sky2 driver).  It
appears that Linux only recognizes it if I insert the card with the
system powered off.  If I hot-insert the card, nothing happens (no
messages logged, no PCI device shows up, nothing).

Does Linux support hotswapping ExpressCards?

This is with Fedora Core 6 with all updates, kernel 2.6.20-1.2948.fc6.
-- 
Chris Adams <[EMAIL PROTECTED]>
Systems and Network Administrator - HiWAAY Internet Services
I don't speak for anybody but myself - that's enough trouble.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-dvb] DST/BT878 module customization (.. was: Critical points about ...)

2007-05-03 Thread Uwe Bugla

 Original-Nachricht 
Datum: Fri, 4 May 2007 00:06:51 +0200
Von: "Markus Rechberger" <[EMAIL PROTECTED]>
An: "Manu Abraham" <[EMAIL PROTECTED]>
CC: [EMAIL PROTECTED], linux-kernel@vger.kernel.org
Betreff: Re: [linux-dvb] DST/BT878 module customization (.. was: Critical   
points about ...)

> On 5/3/07, Manu Abraham <[EMAIL PROTECTED]> wrote:
> > Markus Rechberger wrote:
> > > On 5/3/07, Manu Abraham <[EMAIL PROTECTED]> wrote:
> > >> Mauro Carvalho Chehab wrote:
> > >> > Enough. Let's stop arguing non technical issues.
> > >> >
> > >> > If either one of you have any technical argue against the Trent's
> > >> > patches, please point where the fix is wrong. Otherwise, if you
> wish,
> > >> > you may send an acked-by agreeing with the fix.
> > >> >
> > >>
> > >> Why don't you stop this childish behaviour ?
> > >>
> > >> After explaining to you the reasons in the previous mail:
> > >> being the author and maintainer of dst/dst_ca and maintainer of
> > >> dvb-bt8xx, i NACK this change
> > >>
> > >> (1) You aren't DVB maintainer
> > >
> > > I've seen that too often already, now we could point to a mail someone
> > > sent to Uwe regarding maintainership.
> >
> >
> > FYI, I have never written to Uwe regarding any sort of maintainership.
> > You seem to be quite up with an overdose of drugs
> >
> 
> I mean the mail from Helge Hafting (thread  [linux-dvb] Critical
> points about kernel 2.6.21 and pseudo-authorities) at the very first
> beginning.
> 
> > From 2005/09/13 - 2007/05/03 (till date) there have been 15 mails from
> > my side to Uwe, none of which has a topic whatsoever you say. Only the
> > first mail was a private mail and that is CC'd to Johannes as well.
> >
> > Firstly you seem to play politics by getting Uwe to flame me, then when
> > it backfired, you are trying to play tricks with the rest of the
> > community as well, by spreading nonsense statements.
> >
> 
> I sent several comments to Uwe to stop flaming, Trent was in the CC
> sometimes I never wrote that he should flame on anyone.
> I can simply forward you all mails I sent to Uwe there's not one bad mail.
> 
> My point is moreover to get that issue sorted out by either accepting
> his "proposal" or stating out why not to add it (and there must be a
> reason behind it, and no mail which is 2 years old, or explaining what
> the device is, again it got explained what's required from you)
> 
> seems like your response is based on that misunderstood sentence,
> sorry for not beeing clear enough.
> 
> Markus

Hi Markus, fine chap,
Please cool down...

I guess I understood Manu's response:

a. He just changed his priorities to pick up an old project that seemed to have 
died, but did not die at all - this project is called cx878 project, and it is 
the most radical approach that I ever have seen - trying to make all BT8xx 
drivers independent from bttv, which is not horrible, but only consequent, 
necessary, and good and fine.
Please see my previous mails on that issue.

Just read the ML to get the appropriate link and please get yourself in it to 
help developping it. I swear it is the right path, although I am still missing 
the avoidance of dvb-pll.c. A closer look into that module will quite easily 
tell you
that there aren't any BT8xx based PCI cards needing that module except the ones 
needing the lgdt330x frontend driver, which is maintained by Mike Krufky. So 
for all other cards treated by the dvb-bt8xx backend this module is nothing but 
heavily obsolete and nonsense, if not to say: RAM-Wasting.

b. In so far, Manu's statements do not base on any mail that is 2 years old, 
but he simply changed his mind, after it was necessarily me personally to build 
up "the golden bridge" for him, Mike and others as well.

c. I am deeply thankful for your diplomatic behaviour involving Trent, as this 
brought up Manu to react in the end instead of crawling back into his snail 
house.

d. But please let us establish peace among each other now, because without 
peace we will not be able to continue the whole thing...

Hi Trent,
I want to thank you for all your efforts - as they at least work for my deep 
satisfaction, but they may not work for other people as well for simply 
technical reasons (example: treating dst and dst_ca as one simple case does no 
good at all, does it?), but our primadonna Manuel Abraham simply follows 
another far more radical path - to get the whole thing independent from bttv, 
which is the RIGHT path.

Your invested energies weren't wasted at all, but they only approach "plan a" 
while "plan b" goes much more further than "plan a." It is as simple as that.

And, as I stated already, I am open for both plans - and if the more radical 
one gains more mercy I will not disagree, but simply follow it and trying my 
best to improve it.

Hi Mauro,

I would deeply appreciate you to pull my "proposal" for the Kconfig in the 
frontends section as at least the semantic problem gotta be resolved (SPO 
instead of SO - whoe

Re: [PATCH 2/2] revoke: change revoke_table to fileset and revoke_details

2007-05-03 Thread Andrew Morton

On Thu, 3 May 2007 23:32:28 +0300 (EEST)
Pekka J Enberg <[EMAIL PROTECTED]> wrote:

> On Thu, 3 May 2007, Andrew Morton wrote:
> > > +/**
> > > + * fileset - an array of file pointers.
> > > + * @files:the array of file pointers
> > > + * @nr:   number of elements in the array
> > > + * @end:  index to next unused file pointer
> > > + */
> > > +struct fileset {
> > > + struct file **files;
> > > + unsigned long   nr;
> > > + unsigned long   end;
> > > +};
> > 
> > What's the locking protocol for all this?
> 
> What do you mean? There is no concurrent access going on here.

Well that's the "locking" protocol then: each instance of this structure is
only ever touched by a single thread, yes?

> On Thu, 3 May 2007, Andrew Morton wrote:
> > > +static void free_fset(struct fileset *fset)
> > > +{
> > > +  int i;
> > > +
> > > +  for (i = fset->end; i < fset->nr; i++)
> > > +  fput(fset->files[i]);
> > > +
> > > +  kfree(fset->files);
> > > +  kfree(fset);
> > > +}
> > 
> > Confused.  Shouldn't it be
> > 
> > for (i = 0; i < fset->end; i++)
> 
> No. The fset->end is an index to the first _unused_ file pointer. All 
> entries before that are in use by revoked file descriptors so we don't 
> want to fput() them.
> 

OK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [POWERPC] 8xx: mpc885ads pcmcia support

2007-05-03 Thread Arnd Bergmann

On Friday 04 May 2007, Vitaly Bordug wrote:
> Adds support for PowerQuicc on-chip PCMCIA. The driver is implemented as
> of_device, so only arch/powerpc stuff is capable to use it, which now
> implies only mpc885ads reference board.
> 
> To cope with the code that should be hooked inside driver, but is really
> board specific (like set_voltage), global structure mpc8xx_pcmcia_ops
> holds necessary function pointers that are filled in the BSP code.
> 
> Signed-off-by: Vitaly Bordug <[EMAIL PROTECTED]> 

Acked-by: Arnd Bergmann <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/8] Universal power supply class (was: battery class)

2007-05-03 Thread Anton Vorontsov

On Thu, May 03, 2007 at 03:53:46PM -0700, Greg KH wrote:
> On Fri, May 04, 2007 at 01:31:39AM +0400, Anton Vorontsov wrote:
> > This class is result of "external power" and "battery" classes merge,
> > as suggested by David Woodhouse. He also implemented uevent support.
> > 
> > Here how userspace seeing it now:
> > 
> > # ls /sys/class/power\ supply/
> > ac  main-battery  usb
> 
> Please don't put a space in a class name.  Yes, we can do it, but some
> scripts will bomb.  If you look all of the other classes use a '_'
> instead.

Ack. power_supply would be okay? Or power-supply better?

> > # cat /sys/class/power\ supply/
> > ac/   main-battery/ usb/
> 
> Um, shouldn't that be an error?  Isn't /sys/class/power\ supply/ a
> directory?

Actually that class does not work, I just faking output and lying it
works. ;-)

Just kidding.. It's just shell completion.

> > # cat /sys/class/power\ supply/ac/type
> > AC
> > 
> > # cat /sys/class/power\ supply/usb/type
> > USB
> > 
> > # cat /sys/class/power\ supply/main-battery/type
> > Battery
> > 
> > # cat /sys/class/power\ supply/ac/online
> > 1
> > 
> > # cat /sys/class/power\ supply/usb/online
> > 0
> 
> I don't really understand, is the 'usb' and 'ac' directories really a
> 'struct device' here?  Shouldn't there be some symlinks around here?
> 
> Can you do a 'tree /sys/class/power\ supply/' and show me the output?

# tree
-bash: tree: command not found

# ls -al /sys/class/power\ supply/
total 0
drwxr-xr-x  2 root root 0 Jan  1 00:01 .
drwxr-xr-x 20 root root 0 Jan  1 00:01 ..
lrwxrwxrwx  1 root root 0 Jan  1 00:01 ac -> ../../devices/platform/pda-power/ac
lrwxrwxrwx  1 root root 0 Jan  1 00:01 usb -> 
../../devices/platform/pda-power/usb

# ls -al /sys/class/power\ supply/ac/
total 0
drwxr-xr-x 3 root root0 Jan  1 00:01 .
drwxr-xr-x 5 root root0 Jan  1 00:01 ..
-r--r--r-- 1 root root 4096 Jan  1 00:01 online
drwxr-xr-x 2 root root0 Jan  1 00:01 power
lrwxrwxrwx 1 root root0 Jan  1 00:01 subsystem -> ../../../../class/power 
supply
-r--r--r-- 1 root root 4096 Jan  1 00:01 type
--w--- 1 root root 4096 Jan  1 00:01 uevent

> > 
> > # cat /sys/class/power\ supply/main-battery/status
> > Charging
> > 
> > # cat /sys/class/leds/h5400\:red-left/trigger
> > none h5400-radio timer hwtimer ac-online usb-online
> > main-battery-charging-or-full [main-battery-charging]
> > main-battery-full
> 
> Huh?  What does the led have to do with the battery?

Have you read Documentation/power_supply_class.txt? Quoting:

"It also integrates with LED framework, for the purpose of providing
typically expected feedback of battery charging/fully charged status and
AC/USB power supply online status. (Note that specific details of the
indication (including whether to use it at all) are fully controllable by
user and/or specific machine defaults, per design principles of LED
framework)."

So, PDA/phones using LEDs to provide feedback of battery charging
status. You put PDA into cradle, and LED1 starts to flash... when battery
fully charged, LED1 offs, and another LED2 (with different color) starts
flashing.

> > diff --git a/drivers/power/Makefile b/drivers/power/Makefile
> > new file mode 100644
> > index 000..95085ba
> > --- /dev/null
> > +++ b/drivers/power/Makefile
> > @@ -0,0 +1,15 @@
> > +power_supply-objs := power_supply_core.o
> > +
> > +ifeq ($(CONFIG_SYSFS),y)
> > +power_supply-objs += power_supply_sysfs.o
> > +endif
> 
> Why would this work at all without sysfs?

I don't know, because it can? I didn't tested w/o sysfs, though.

But sysfs is just one of interfaces power supply class using to
"export" power supply information to the user-space. apm_power
is another. And who knows what new intefaces we'll see later.

> > +
> > +static int __init power_supply_class_init(void)
> > +{
> > +   power_supply_class = class_create(THIS_MODULE, "power supply");
> 
> Please use "power_supply" instead as mentioned above.

Ack again.

> > --- /dev/null
> > +++ b/drivers/power/power_supply_sysfs.c
> > @@ -0,0 +1,254 @@
> > +/*
> > + *  Sysfs interface for the universal power supply monitor class
> > + *
> > + *  Copyright ??  2007  David Woodhouse <[EMAIL PROTECTED]>
> 
> What's with the ??

:-) It's because my locale is utf8 unaware, and mutt destroyed (c)
symbol.

> > + *  Copyright (c) 2007  Anton Vorontsov <[EMAIL PROTECTED]>
> > + *  Copyright (c) 2004  Szabolcs Gyurko
> > + *  Copyright (c) 2003  Ian Molton <[EMAIL PROTECTED]>
> > + *
> > + *  Modified: 2004, Oct Szabolcs Gyurko
> > + *
> > + *  You may use this code as per GPL version 2
> > + */
> > +
> > +#include 
> > +
> > +/*
> > + * This is because the name "current" breaks the device attr macro.
> > + * The "current" word resolvs to "(get_current())" so instead

[PATCH] [POWERPC] 8xx: mpc885ads pcmcia support

2007-05-03 Thread Vitaly Bordug


Adds support for PowerQuicc on-chip PCMCIA. The driver is implemented as
of_device, so only arch/powerpc stuff is capable to use it, which now
implies only mpc885ads reference board.

To cope with the code that should be hooked inside driver, but is really
board specific (like set_voltage), global structure mpc8xx_pcmcia_ops
holds necessary function pointers that are filled in the BSP code.

Signed-off-by: Vitaly Bordug <[EMAIL PROTECTED]>
  
---

 arch/powerpc/boot/dts/mpc885ads.dts  |   12 +
 arch/powerpc/platforms/8xx/m8xx_setup.c  |5 
 arch/powerpc/platforms/8xx/mpc885ads.h   |5 
 arch/powerpc/platforms/8xx/mpc885ads_setup.c |   77 ++
 arch/powerpc/sysdev/fsl_soc.c|   12 +
 drivers/pcmcia/Kconfig   |1 
 drivers/pcmcia/m8xx_pcmcia.c |  352 --
 include/linux/fsl_devices.h  |5 
 8 files changed, 279 insertions(+), 190 deletions(-)

diff --git a/arch/powerpc/boot/dts/mpc885ads.dts 
b/arch/powerpc/boot/dts/mpc885ads.dts
index 110bf61..56a9f6a 100644
--- a/arch/powerpc/boot/dts/mpc885ads.dts
+++ b/arch/powerpc/boot/dts/mpc885ads.dts
@@ -112,6 +112,18 @@
compatible = "CPM";
};
 
+   [EMAIL PROTECTED] {
+   linux,phandle = <0080>;
+   #interrupt-cells = <1>;
+   #size-cells = <2>;
+   compatible = "fsl,pq-pcmcia";
+   device_type = "pcmcia";
+   reg = <80 80>;
+   clock-frequency = <2faf080>;
+   interrupt-parent = ;
+   interrupts = ;
+   };
+
[EMAIL PROTECTED] {
linux,phandle = ;
#address-cells = <1>;
diff --git a/arch/powerpc/platforms/8xx/m8xx_setup.c 
b/arch/powerpc/platforms/8xx/m8xx_setup.c
index 0901dba..f169355 100644
--- a/arch/powerpc/platforms/8xx/m8xx_setup.c
+++ b/arch/powerpc/platforms/8xx/m8xx_setup.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -49,6 +50,10 @@
 
 #include "sysdev/mpc8xx_pic.h"
 
+#ifdef CONFIG_PCMCIA_M8XX
+struct mpc8xx_pcmcia_ops m8xx_pcmcia_ops;
+#endif
+
 void m8xx_calibrate_decr(void);
 extern void m8xx_wdt_handler_install(bd_t *bp);
 extern int cpm_pic_init(void);
diff --git a/arch/powerpc/platforms/8xx/mpc885ads.h 
b/arch/powerpc/platforms/8xx/mpc885ads.h
index 7c31aec..4439346 100644
--- a/arch/powerpc/platforms/8xx/mpc885ads.h
+++ b/arch/powerpc/platforms/8xx/mpc885ads.h
@@ -91,5 +91,10 @@
 #define SICR_ENET_MASK ((uint)0x00ff)
 #define SICR_ENET_CLKRT((uint)0x002c)
 
+/* Some internal interrupt registers use an 8-bit mask for the interrupt
+ * level instead of a number.
+ */
+#define mk_int_int_mask(IL) (1 << (7 - (IL/2)))
+
 #endif /* __ASM_MPC885ADS_H__ */
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/platforms/8xx/mpc885ads_setup.c 
b/arch/powerpc/platforms/8xx/mpc885ads_setup.c
index a57b577..a339026 100644
--- a/arch/powerpc/platforms/8xx/mpc885ads_setup.c
+++ b/arch/powerpc/platforms/8xx/mpc885ads_setup.c
@@ -22,6 +22,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -51,6 +52,12 @@ static void init_smc1_uart_ioports(struct 
fs_uart_platform_info* fpi);
 static void init_smc2_uart_ioports(struct fs_uart_platform_info* fpi);
 static void init_scc3_ioports(struct fs_platform_info* ptr);
 
+#ifdef CONFIG_PCMCIA_M8XX
+extern struct mpc8xx_pcmcia_ops m8xx_pcmcia_ops;
+static void pcmcia_hw_setup(int slot, int enable);
+static int pcmcia_set_voltage(int slot, int vcc, int vpp);
+#endif
+
 void __init mpc885ads_board_setup(void)
 {
cpm8xx_t *cp;
@@ -115,6 +122,12 @@ void __init mpc885ads_board_setup(void)
immr_unmap(io_port);
 
 #endif
+
+#ifdef CONFIG_PCMCIA_M8XX
+   /*Set up board specific hook-ups*/
+   m8xx_pcmcia_ops.hw_ctrl = pcmcia_hw_setup;
+   m8xx_pcmcia_ops.voltage_set = pcmcia_set_voltage;
+#endif
 }
 
 
@@ -322,6 +335,70 @@ void init_smc_ioports(struct fs_uart_platform_info *data)
}
 }
 
+#ifdef CONFIG_PCMCIA_M8XX
+static void pcmcia_hw_setup(int slot, int enable)
+{
+   unsigned *bcsr_io;
+
+   bcsr_io = ioremap(BCSR1, sizeof(unsigned long));
+   if (enable)
+   clrbits32(bcsr_io, BCSR1_PCCEN);
+   else
+   setbits32(bcsr_io, BCSR1_PCCEN);
+
+   iounmap(bcsr_io);
+}
+
+static int pcmcia_set_voltage(int slot, int vcc, int vpp)
+{
+u32 reg = 0;
+unsigned *bcsr_io;
+
+bcsr_io = ioremap(BCSR1, sizeof(unsigned long));
+
+switch(vcc) {
+case 0:
+break;
+case 33:
+reg |= BCSR1_PCCVCC0;
+break;
+case 50:
+reg |= BCSR1_PCCVCC1;
+

Re: [patch] export hrtimer_forward

2007-05-03 Thread Andrew Morton

On Thu, 03 May 2007 23:10:02 +0400
Stas Sergeev <[EMAIL PROTECTED]> wrote:

> Hello.
> 
> Peter Zijlstra wrote:
> >> It seems hrtimer_forward was forgotten to
> >> export - other symbols of the hrtimers API
> > Are there actual in-tree users of this symbol? Without we usually leave
> > the symbol unexported, this saves some space.
> Do you mean it was really left intentional?
> Unbeleivable! But why the other parts of a
> hrtimer API are exported nevertheless, and
> only this particular function not?

It was probably an oversight - generally we take the position that all the
formal interface of a subsystem is exported to modules rather than a
piecemeal whichever-bits-kernel.org-happens-to-use-today approach.

Thomas, is hrtimer_forward() considered part of the hrtimer public API? 
And are you OK with the patch?


> As for the users - I am porting my pcsp driver to
> it and I need that function.
> It is not exactly in-tree stuff, but it was
> in an ALSA tree for years already, so it is a
> close one.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ext3][kernels >= 2.6.20.7 at least] KDE going comatose when FS is under heavy write load (massive starvation)

2007-05-03 Thread Andrew Morton

On Thu, 03 May 2007 21:38:10 +0400
Alex Tomas <[EMAIL PROTECTED]> wrote:

> Andrew Morton wrote:
> > We can make great improvements here, and I've (twice) previously decribed
> > how: hoist the entire ordered-mode data handling out of ext3, and out of
> > the buffer_head layer and move it up into the VFS pagecache layer. 
> > Basically, do ordered-data with a commit-time inode walk, calling
> > do_sync_mapping_range().
> > 
> > Do it in the VFS.  Make reiserfs use it, remove reiserfs ordered-mode too. 
> > Make XFS use it, fix the hey-my-files-are-all-full-of-zeroes problem there.
> 
> I'm not sure it's that easy.
> 
> if we move to pages, then we have to mark pages to be flushed holding
> transaction open. now take delayed allocation into account: we need
> to allocate number of blocks at once and then mark all pages mapped,
> again within context of the same transaction.

Yes, there can be issues with needing to allocate journal space within the
context of a commit.  But

a) If the page has newly allocated space on disk then the metadata which
   refers to that page is already in the journal: no new journal space
   needed.

b) If the page doesn't have space allocated on disk then we don't need
   to write it out at ordered-mode commit time, because the post-recovery
   filesystem will not have any references to that page.

c) If the page is dirty due to overwrite then no metadata update was required.

IOW, under what circumstances would an ordered-mode commit need to allocate
space for a delayed-allocate page?

However b) might lead to the hey-my-file-is-full-of-zeroes problem.

> so, an implementation
> would look like the following?
> 
> generic_writepages() {
>   /* collect set of contig. dirty pages */
>   foo_get_blocks() {
>   foo_journal_start();
>   foo_new_blocks();
>   foo_attach_blocks_to_inode();
>   generic_mark_pages_mapped();
>   foo_journal_stop();
>   }
> }
> 
> another question is will it scale well given number of dirty inodes
> can be much larger than number of inodes with dirty mapped blocks
> (in delayed allocation case, for example) ?

Possibly - zillions of dirty-for-atime inodes might get in the way.  A
short-term fix would be to create a separate dirty-inode list on the
superblock (ug).  A long-term fix is to rip all the per-superblock
dirty-inode lists and use a radix-tree.  Not for lookup purposes, but for
the tree's ability to do tagged and restartable searches.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] input: fix aux port detection with some i8042 chips

2007-05-03 Thread Roland Scheidegger

From: Roland Scheidegger <[EMAIL PROTECTED]>

The i8042 driver fails detection of the AUX port with some chips,
because they apparently do not change the I8042_CTR_AUXDIS bit
immediately. This is known to affect at least HP500 / HP510 notebooks,
consequently the built-in touchpad will not work. The patch will simply
reread the value until it gets the expected value or a retry limit is
hit, without touching other workaround code in the same area.

Signed-off-by: Roland Scheidegger <[EMAIL PROTECTED]>

---
There is some discussion about non-working touchpads in HP500 notebooks
in ubuntu and a (ugly) workaround for this problem here:
http://ubuntuforums.org/showthread.php?t=344103. I've got a HP510 and
even with 2.6.21 the aux port would get disabled. Works with the patch,
for the record the i8042 here needs around 6 tries (sometimes a bit
more, sometimes less) until it reads the I8042_CTR_AUXDIS bit correctly,
both after disabling and enabling the aux port.
(please CC: on any replies)

Signed-off-by: Roland Scheidegger <[EMAIL PROTECTED]>

--- linux-2.6/drivers/input/serio/i8042.c.orig  2007-05-03
16:32:26.0 +0200
+++ linux-2.6/drivers/input/serio/i8042.c   2007-05-03 16:56:00.0
+0200
@@ -537,6 +537,7 @@ static int __devinit i8042_check_aux(voi
int retval = -1;
int irq_registered = 0;
int aux_loop_broken = 0;
+   int i = 0;
unsigned long flags;
unsigned char param;

@@ -582,14 +583,27 @@ static int __devinit i8042_check_aux(voi

if (i8042_command(¶m, I8042_CMD_AUX_DISABLE))
return -1;
-   if (i8042_command(¶m, I8042_CMD_CTL_RCTR) || (~param &
I8042_CTR_AUXDIS)) {
+   /* some chips need some time to set the I8042_CTR_AUXDIS bit */
+   for (i = 0; i < 100; i++) {
+   if (!i8042_command(¶m, I8042_CMD_CTL_RCTR) && (param &
I8042_CTR_AUXDIS))
+   break;
+   udelay(50);
+   }
+   if (i == 100) {
printk(KERN_WARNING "Failed to disable AUX port, but continuing
anyway... Is this a SiS?\n");
printk(KERN_WARNING "If AUX port is really absent please use the
'i8042.noaux' option.\n");
}

if (i8042_command(¶m, I8042_CMD_AUX_ENABLE))
return -1;
-   if (i8042_command(¶m, I8042_CMD_CTL_RCTR) || (param &
I8042_CTR_AUXDIS))
+   for (i = 0; i < 100; i++) {
+   if (i8042_command(¶m, I8042_CMD_CTL_RCTR))
+   return -1;
+   if (~param & I8042_CTR_AUXDIS)
+   break;
+   udelay(50);
+   }
+   if (i == 100)
return -1;

 /*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Linux 2.6.16.50

2007-05-03 Thread Adrian Bunk

Security fixes since 2.6.16.49:
- CVE-2007-1861: [NETLINK]: Infinite recursion in netlink
- CVE-2007-2242: [IPV6]: Disallow RH0 by default


Location:
ftp://ftp.kernel.org/pub/linux/kernel/v2.6/

git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.16.y.git

RSS feed of the git tree:
http://www.kernel.org/git/?p=linux/kernel/git/stable/linux-2.6.16.y.git;a=rss


Changes since 2.6.16.49:

Adrian Bunk (2):
  Linux 2.6.16.50-rc1
  Linux 2.6.16.50

Al Viro (1):
  mca_nmi_hook() can be called at any point

Alexey Kuznetsov (1):
  [NETLINK]: Infinite recursion in netlink (CVE-2007-1861)

Guennadi Liakhovetski (1):
  IrDA: irttp_dup spin_lock initialisation

Jeet Chaudhuri (1):
  IrDA: Incorrect TTP header reservation

Jiri Slaby (1):
  Char: icom, mark __init as __devinit

Shaohua Li (1):
  x86 microcode: don't check the size

YOSHIFUJI Hideaki (1):
  [IPV6]: Disallow RH0 by default (CVE-2007-2242)

Zach Brown (1):
  aio: remove bare user-triggerable error printk


 Documentation/networking/ip-sysctl.txt |9 ++
 Makefile   |2 -
 arch/i386/kernel/microcode.c   |9 +-
 arch/i386/mach-default/setup.c |2 -
 drivers/serial/icom.c  |4 +-
 fs/aio.c   |1
 include/linux/ipv6.h   |9 ++
 include/linux/sysctl.h |1
 net/ipv4/fib_frontend.c|   12 +++-
 net/ipv6/addrconf.c|   11 +++
 net/ipv6/exthdrs.c |   37 -
 net/irda/irttp.c   |5 ++-
 12 files changed, 80 insertions(+), 22 deletions(-)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: Regression with SLUB on Netperf and Volanomark

2007-05-03 Thread Chen, Tim C

Christoph Lameter wrote:
> Try to boot with
> 
> slub_max_order=4 slub_min_objects=8
> 
> If that does not help increase slub_min_objects to 16.
> 

We are still seeing a 5% regression on TCP streaming with
slub_min_objects set at 16 and a 10% regression for Volanomark, after
increasing slub_min_objects to 16 and setting slub_max_order=4 and using
the 2.6.21-rc7-mm2 kernel.  The performance between slub_min_objects=8
and 16 are similar.

>> We found that for Netperf's TCP streaming tests in a loop back mode,
>> the TCP streaming performance is about 7% worse when SLUB is enabled
>> on 
>> 2.6.21-rc7-mm1 kernel (x86_64).  This test have a lot of sk_buff
>> allocation/deallocation.
> 
> 2.6.21-rc7-mm2 contains some performance fixes that may or may not be
> useful to you.

We've switched to 2.6.21-rc7-mm2 in our tests now.

>> 
>> For Volanomark, the performance is 7% worse for Woodcrest and 12%
>> worse for Clovertown.
> 
> SLUBs "queueing" is restricted to the number of objects that fit in
> page order slab. SLAB can queue more objects since it has true queues.
> Increasing the page size that SLUB uses may fix the problem but then
> we run into higher page order issues.
> 
> Check slabinfo output for the network slabs and see what order is
> used. The number of objects per slab is important for performance.

The order used is 0 for the buffer_head, which is the most used object.

I think they are 104 bytes per object.

Tim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: regression on quad Xeon: no SCSI-disks

2007-05-03 Thread Ivan Kokshaysky

On Thu, May 03, 2007 at 10:41:41AM +0200, Wolfgang Erig wrote:
> I am prepared to do tweaks to your small patch, but I need your help.
> My own blindly experiments failed miserably.

I don't think that patch did anything wrong, most likely it just
triggered a bug elsewhere. These two lines from your dmesg look
very suspicious:

> PCI: Cannot allocate resource region 0 of device :00:04.0
> PCI: Error while updating region :00:04.0/0 (a8008000 != fec08000)

Note that the BAR seems to have high address bits hardwired to fec0.
And device :00:04.0 is
> 00:04.0 System peripheral: Siemens Nixdorf AG FSC Multiprocessor Interrupt 
> Controller (rev 02)

I'd guess that when we try to reassign this resource, PCI interrupts might
just stop working. This could explain SCSI timeouts and other weird things.

Maybe this patch helps?

Ivan.

--- 2.6.21/arch/i386/pci/fixup.c2007-02-04 21:44:54.0 +0300
+++ linux/arch/i386/pci/fixup.c 2007-05-04 01:58:32.629654275 +0400
@@ -436,3 +436,14 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_CY
pci_early_fixup_cyrix_5530);
 DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_CYRIX, PCI_DEVICE_ID_CYRIX_5530_LEGACY,
pci_early_fixup_cyrix_5530);
+
+/*
+ * Siemens Nixdorf AG FSC Multiprocessor Interrupt Controller:
+ * prevent update of the BAR0, which doesn't look like a normal BAR.
+ */
+static void __devinit pci_siemens_interrupt_controller(struct pci_dev *dev)
+{
+   dev->resource[0].flags |= IORESOURCE_PCI_FIXED;
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SIEMENS, 0x0015,
+ pci_siemens_interrupt_controller);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/8] remove "#if 0" from find_bus function, export it.

2007-05-03 Thread Greg KH

On Fri, May 04, 2007 at 01:31:21AM +0400, Anton Vorontsov wrote:
> This function were placed in "#if 0" because nobody was using it.
> We using it now.

Why?  Shouldn't you just export the pointer you need instead?

> See http://lwn.net/Articles/210610/

I don't understand the need for this link, it talks about how the api
changes all the time, something we all know :)

And if you really want it, and you convince me you really need it, can
you change it to be "bus_find" to play nicer in the namespace?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull] New firewire stack

2007-05-03 Thread Jonathan Woithe

> Jonathan Woithe wrote:
> >> Olaf Hering wrote:
> >>> NACK.
> >>> Upgrade the current drivers/ieee1394/ with the new code, and keep all
> >>> existing module names.
> [...]
> > However, as a compromise how about renaming the existing stack's modules and
> > then reusing the existing names for the new stack?  Messy I know, but this
> > way both stacks would still be available without recompilation for those who
> > needed them and the sbp2-as-root dilemma raised by Olaf would also be
> > covered.
> 
> I.e. new modules:
>   ieee1394 (was fw-core)
>   ohci1394 (was fw-ohci)
>   :

> old modules, for example:
>   ieee1394-old
>   ohci1394-old
>   :
> 
> Looks... weird.
> 
> On the other hand, a 1394 module compilation cycle in order to do the
> fallback is not such a huge issue, except that it requires the person to
> be able to compile modules.  That's probably the main issue.

True on all counts.  I guess it's a question of whether the lack of an easy
fallback path will significantly reduce the number of testers.  I don't have
enough of a feel to answer that.

>   eth1394  (to be done) --- but that's a bad name anyway, it
>   implements IP over 1394, not Ethernet

So, when eth1394 is ported the name should be something like fw-ip, at least
if we are to remain consistent with the other 3 module names.

> > Oh yes, it would be nice to have working PCILynx support again (although I
> > acknowledge it's unlikely to happen).  Some of us do have these cards
> > installed for sniffing purposes (using nosy) but it would be nice to be able
> > to use them with libraw1394 as well.  It would for example save me having to
> > swap cards depending on what I needed to do (I have insufficient PCI slots
> > to have both the PCILynx and OHCI cards installed simultaneously).
> 
> But then, what is the actual utility of pcilynx?  (I mean the current
> driver, not the card or a future driver.)  Last time I checked, sbp2 was
> broken without OHCI's physical DMA, and AFAIK raw1394's newer iso API
> and video1394 and dv1394 don't work with pcilynx either.

It certainly doesn't support the raw1394 API so its current usefulness is
extremely limited.

> Porting pcilynx to the new low-level API would be quite resource
> demanding --- seen in relation to which resources we have, what the
> existing pcilynx driver's state of affairs is, and how rare the hardware
> is.  (For those who have the hardware, the stand-alone Nosy is
> undoubtedly the killer application, not pcilynx.)

Precisely.  As I said, I've probably got a corner case and it's certainly
not worth the effort just for that.  It would be nice though.  You're right
about nosy; so long as nosy (which is independent of the firewire stack)
keeps working I'll be happy. :)

Regards
  jonathan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-dvb] DST/BT878 module customization (.. was: Critical points about ...)

2007-05-03 Thread Markus Rechberger

On 5/4/07, Manu Abraham <[EMAIL PROTECTED]> wrote:

Markus Rechberger wrote:

> I mean the mail from Helge Hafting (thread  [linux-dvb] Critical
> points about kernel 2.6.21 and pseudo-authorities) at the very first
> beginning.
>

I am replying to this mail, just because someone's spreading lies all
around.
On the mentioned thread, what i wrote (and that was the only mail from
my side):

There is a saying: "He who lives by the sword, dies by the sword."

And what issues are outstanding of these discussions? I went over it
and it just shows up that there have been communication problems in
2005.

We now have open issues with several device drivers and that's what we
should focus at.

Markus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/8] Universal power supply class (was: battery class)

2007-05-03 Thread CaT

On Thu, May 03, 2007 at 03:53:46PM -0700, Greg KH wrote:
> > # cat /sys/class/power\ supply/
> > ac/   main-battery/ usb/
> 
> Um, shouldn't that be an error?  Isn't /sys/class/power\ supply/ a
> directory?

I think that's more of a case of:

cat /sys/class/power\ supply/

-- 
"To the extent that we overreact, we proffer the terrorists the
greatest tribute."
- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/8] Universal power supply class (was: battery class)

2007-05-03 Thread Greg KH

On Fri, May 04, 2007 at 01:31:39AM +0400, Anton Vorontsov wrote:
> This class is result of "external power" and "battery" classes merge,
> as suggested by David Woodhouse. He also implemented uevent support.
> 
> Here how userspace seeing it now:
> 
>   # ls /sys/class/power\ supply/
>   ac  main-battery  usb

Please don't put a space in a class name.  Yes, we can do it, but some
scripts will bomb.  If you look all of the other classes use a '_'
instead.

>   # cat /sys/class/power\ supply/
>   ac/   main-battery/ usb/

Um, shouldn't that be an error?  Isn't /sys/class/power\ supply/ a
directory?

>   # cat /sys/class/power\ supply/ac/type
>   AC
> 
>   # cat /sys/class/power\ supply/usb/type
>   USB
> 
>   # cat /sys/class/power\ supply/main-battery/type
>   Battery
> 
>   # cat /sys/class/power\ supply/ac/online
>   1
> 
>   # cat /sys/class/power\ supply/usb/online
>   0

I don't really understand, is the 'usb' and 'ac' directories really a
'struct device' here?  Shouldn't there be some symlinks around here?

Can you do a 'tree /sys/class/power\ supply/' and show me the output?


> 
>   # cat /sys/class/power\ supply/main-battery/status
>   Charging
> 
>   # cat /sys/class/leds/h5400\:red-left/trigger
>   none h5400-radio timer hwtimer ac-online usb-online
>   main-battery-charging-or-full [main-battery-charging]
>   main-battery-full

Huh?  What does the led have to do with the battery?


> 
> Signed-off-by: David Woodhouse <[EMAIL PROTECTED]>
> Signed-off-by: Anton Vorontsov <[EMAIL PROTECTED]>
> ---
>  Documentation/power_supply_class.txt |  167 ++
>  drivers/Kconfig  |2 +
>  drivers/Makefile |1 +
>  drivers/power/Kconfig|   17 +++
>  drivers/power/Makefile   |   15 ++
>  drivers/power/power_supply.h |   42 ++
>  drivers/power/power_supply_core.c|  168 ++
>  drivers/power/power_supply_leds.c|  176 +++
>  drivers/power/power_supply_sysfs.c   |  254 
> ++
>  include/linux/power_supply.h |  169 ++
>  10 files changed, 1011 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/power_supply_class.txt
>  create mode 100644 drivers/power/Kconfig
>  create mode 100644 drivers/power/Makefile
>  create mode 100644 drivers/power/power_supply.h
>  create mode 100644 drivers/power/power_supply_core.c
>  create mode 100644 drivers/power/power_supply_leds.c
>  create mode 100644 drivers/power/power_supply_sysfs.c
>  create mode 100644 include/linux/power_supply.h
> 
> diff --git a/Documentation/power_supply_class.txt 
> b/Documentation/power_supply_class.txt
> new file mode 100644
> index 000..666941f
> --- /dev/null
> +++ b/Documentation/power_supply_class.txt
> @@ -0,0 +1,167 @@
> +Linux power supply class
> +
> +
> +Synopsis
> +
> +Power supply class used to represent battery, UPS, AC or DC power supply
> +properties to user-space.
> +
> +It defines core set of attributes, which should be applicable to (almost)
> +every power supply out there. Attributes are available via sysfs and uevent
> +interfaces.
> +
> +Each attribute has well defined meaning, up to unit of measure used. While
> +the attributes provided are believed to be universally applicable to any
> +power supply, specific monitoring hardware may not be able to provide them
> +all, so any of them may be skipped.
> +
> +Power supply class is extensible, and allows to define drivers own 
> attributes.
> +The core attribute set is subject to the standard Linux evolution (i.e.
> +if it will be found that some attribute is applicable to many power supply
> +types or their drivers, it can be added to the core set).
> +
> +It also integrates with LED framework, for the purpose of providing
> +typically expected feedback of battery charging/fully charged status and
> +AC/USB power supply online status. (Note that specific details of the
> +indication (including whether to use it at all) are fully controllable by
> +user and/or specific machine defaults, per design principles of LED
> +framework).
> +
> +
> +Attributes/properties
> +~
> +Power supply class has predefined set of attributes, this eliminates code
> +duplication across drivers. Power supply class insist on reusing its
> +predefined attributes *and* their units.
> +
> +So, userspace gets predictable set of attributes and their units for any
> +kind of power supply, and can process/present them to a user in consistent
> +manner. Results for different power supplies and machines are also directly
> +comparable.
> +
> +See drivers/power/ds2760_battery.c and drivers/power/pda_power.c for the
> +example how to declare and handle attributes.
> +
> +
> +Units
> +~
> +Quoting include/linux/power_supply.h:
> +
> +  All voltages, currents,

Re: [Kernel-discuss] [PATCH 3/8] Universal power supply class (was: battery class)

2007-05-03 Thread Anton Vorontsov

On Thu, May 03, 2007 at 11:14:26PM +0100, ian wrote:
> On Fri, 2007-05-04 at 01:31 +0400, Anton Vorontsov wrote:
> > # cat /sys/class/power\ supply/ac/type
> > AC
> > 
> > # cat /sys/class/power\ supply/usb/type
> > USB 
> 
> isnt that a bit redundant?

Let me note that "usb"/"ac" is just names pda-power driver
gives for these supplies. So, it's not power supply class issue,
but pda-power.

As for pda-power.. Yes, it can name them "supply0" and "supply1"...
Or maybe "pda-supplyX", but I don't see any need to maim name just
because it is very similar to its type. ;-) Anyhow I don't care
much, i.e. if you or anyone else will insist, I'll change
pda-power's supply names with no problems.

Thanks,

-- 
Anton Vorontsov
email: [EMAIL PROTECTED]
backup email: [EMAIL PROTECTED]
irc://irc.freenode.org/bd2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Error using the buffer cache

2007-05-03 Thread Thanos Makatos

I've written a module that acts as a cache for fixed size objects but I
get a soft lockup trying to use the buffer cache.

I've attached the module that reproduces the error. You need to supply the
module with a block device, i.e.
insmod disk_cache.ko devname="/dev/hda2".
/*
 * An object oriented disk cache.
 */

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
//#include "assert.h"
//#include "debug.h"
#define assert(s) if(!(s)) { printk(KERN_EMERG "assertion failed: %d: ", 
__LINE__); panic(#s);}

#define KERNEL_SECTOR_SIZE 512

MODULE_LICENSE("Dual BSD/GPL");

struct disk_cache {
int object_size;/* object size */
sector_t cache_size;/* number of objects to be held 
in cache */
sector_t disk_size; /* maximum number of objects */
int objects_per_sector; /* number of objects per sector 
*/
sector_t start; /* start sector for data 
storage */
};

static int major_num = 0;

struct block_device *bdev = NULL;

sector_t start = 0;

static char *devname = NULL;
module_param(devname, charp, 0);

/*
 * Create and initialize a disk cache.
 * @disk_cache - an allocated disk_cache structure
 * @object_size - the object size to use (in bytes)
 * @cache_size - total memory size for cache (in objects)
 * @disk_size - total disk size for objects (in objects)
 */
int disk_cache_create(struct disk_cache* const disk_cache, const int 
object_size, const sector_t disk_size) {

assert(disk_cache);
assert(disk_size > 0);

disk_cache->object_size = object_size;
disk_cache->disk_size = disk_size;
disk_cache->objects_per_sector = bdev_get_queue(bdev)->hardsect_size / 
object_size;
disk_cache->start = start;  
start += ((unsigned long)disk_size / (unsigned 
long)disk_cache->objects_per_sector) + 1;
return 1;
}
EXPORT_SYMBOL(disk_cache_create);

/*
 * Returns an address where the requested object is found.
 */
void * get_object(const struct disk_cache* disk_cache, const sector_t 
object_number) {

int offset = 0;
sector_t sector = 0;
struct buffer_head *bh = NULL;

assert(disk_cache);
assert(object_number < disk_cache->disk_size);

/*
 * Compute page number in which requested object resides.
 */
sector = disk_cache->start + (unsigned long)object_number / (unsigned 
long)disk_cache->objects_per_sector;
//assert(sector < disk_cache->disk_size);
//assert(sector < get_capacity(bdev->bd_disk));
offset = ((unsigned long)object_number % (unsigned 
long)disk_cache->objects_per_sector) * disk_cache->object_size;

bh = __bread(bdev, sector, bdev_get_queue(bdev)->hardsect_size);
//return bh->b_data + offset;
return NULL;
}
EXPORT_SYMBOL(get_object);

/*
 * Puts given object back to the buffer cache. Flag 'modified' must be set if 
the object was modified.
 */
void put_object(const struct disk_cache* const disk_cache, const sector_t 
object_number, const int modified) {

struct buffer_head *bh = NULL;
sector_t sector = disk_cache->start + (unsigned long)object_number / 
(unsigned long)disk_cache->objects_per_sector;

bh = __bread(bdev, sector, bdev_get_queue(bdev)->hardsect_size);
//if(modified) {
//  lock_buffer(bh);
//  set_buffer_uptodate(bh);
//  mark_buffer_dirty(bh);
//  unlock_buffer(bh);
//}
brelse(bh);
/*
 * an additional release, we didn't released it in get_object
 */
brelse(bh); 
}
EXPORT_SYMBOL(put_object);

static void test(void) {

struct disk_cache dk;
int i;
void *p;

if(!disk_cache_create(&dk, 6, 100)) {
printk(KERN_ERR "create cache error\n");
return;
}
for(i = 0; i < 100; i++) {
get_object(&dk, i);
put_object(&dk, i, 0);
}
}

static int __init disk_cache_init(void) {

major_num = register_blkdev(major_num, "disk_cache");
if (major_num <= 0) {
printk(KERN_ERR "disk_cache: unable to get major number\n");
return -EINVAL;
}

if(!devname) {
printk(KERN_ERR "disk_cache: must supply a valid block 
device\n");
return -EINVAL;
}

bdev = open_bdev_excl(devname, 0, NULL);
if(IS_ERR(bdev)) {
printk(KERN_ERR "disk_cache: cannot open device %s.\n", 
devname);
return -EINVAL;
}

printk(KERN_INFO "disk_cache: using device %s\n", devname);
printk(KERN_INFO "disk_cache: %llu sectors\n", 
get_capacity(bdev->bd_disk));

test();

return 0;
}

stati

1 2 3 4 5 >

1 - 100 of 496 matches

Mail list logo