Re: [PATCHv2 net-next] dropwatch: Support monitoring of dropped frames
Dne 04. 08. 20 v 18:09 izabela.bakoll...@gmail.com napsala: From: Izabela Bakollari Dropwatch is a utility that monitors dropped frames by having userspace record them over the dropwatch protocol over a file. This augument allows live monitoring of dropped frames using tools like tcpdump. With this feature, dropwatch allows two additional commands (start and stop interface) which allows the assignment of a net_device to the dropwatch protocol. When assinged, dropwatch will clone dropped frames, and receive them on the assigned interface, allowing tools like tcpdump to monitor for them. With this feature, create a dummy ethernet interface (ip link add dev dummy0 type dummy), assign it to the dropwatch kernel subsystem, by using these new commands, and then monitor dropped frames in real time by running tcpdump -i dummy0. Signed-off-by: Izabela Bakollari --- Changes in v2: - protect the dummy ethernet interface from being changed by another thread/cpu --- include/uapi/linux/net_dropmon.h | 3 ++ net/core/drop_monitor.c | 84 2 files changed, 87 insertions(+) [...] @@ -255,6 +259,21 @@ static void trace_drop_common(struct sk_buff *skb, void *location) out: spin_unlock_irqrestore(&data->lock, flags); + spin_lock_irqsave(&interface_lock, flags); + if (interface && interface != skb->dev) { + skb = skb_clone(skb, GFP_ATOMIC); I suggest naming the cloned skb "nskb". Less potential for confusion that way. + if (skb) { + skb->dev = interface; + spin_unlock_irqrestore(&interface_lock, flags); + netif_receive_skb(skb); + } else { + spin_unlock_irqrestore(&interface_lock, flags); + pr_err("dropwatch: Not enough memory to clone dropped skb\n"); Maybe avoid logging the error here. In NET_DM_ALERT_MODE_PACKET mode, drop monitor does not log about the skb_clone() failure either. We don't want to open the possibility to flood the logs in case this somehow gets triggered by every packet. A coding style suggestion - can you rearrange it so that the error path code is spelled out first? Then the regular path does not have to be indented further: nskb = skb_clone(skb, GFP_ATOMIC); if (!nskb) { spin_unlock_irqrestore(&interface_lock, flags); return; } /* ... implicit else ... Proceed normally ... */ + return; + } + } else { + spin_unlock_irqrestore(&interface_lock, flags); + } } static void trace_kfree_skb_hit(void *ignore, struct sk_buff *skb, void *location) @@ -1315,6 +1334,53 @@ static int net_dm_cmd_trace(struct sk_buff *skb, return -EOPNOTSUPP; } +static int net_dm_interface_start(struct net *net, const char *ifname) +{ + struct net_device *nd = dev_get_by_name(net, ifname); + + if (nd) + interface = nd; + else + return -ENODEV; + + return 0; Similarly here, consider: if (!nd) return -ENODEV; interface = nd; return 0; But maybe I'm nitpicking ... +} + +static int net_dm_interface_stop(struct net *net, const char *ifname) +{ + dev_put(interface); + interface = NULL; + + return 0; +} + +static int net_dm_cmd_ifc_trace(struct sk_buff *skb, struct genl_info *info) +{ + struct net *net = sock_net(skb->sk); + char ifname[IFNAMSIZ]; + + if (net_dm_is_monitoring()) + return -EBUSY; + + memset(ifname, 0, IFNAMSIZ); + nla_strlcpy(ifname, info->attrs[NET_DM_ATTR_IFNAME], IFNAMSIZ - 1); + + switch (info->genlhdr->cmd) { + case NET_DM_CMD_START_IFC: + if (!interface) + return net_dm_interface_start(net, ifname); + else + return -EBUSY; + case NET_DM_CMD_STOP_IFC: + if (interface) + return net_dm_interface_stop(net, interface->name); + else + return -ENODEV; ... and here too. Best regards, Michal
Re: [GIT PULL] kdbus for 4.1-rc1
On 04/15/2015 09:31 AM, Mike Galbraith wrote: > it seems [systemd] has now mandated group scheduling. What makes you think so? Was it the fact that by default you have a populated /sys/fs/cgroup/cpu/ hierarchy? This is either because some unit requests the use of the cpu controller using one of the CPU*= directives from systemd.resource-control(5), or (perhaps more likely) because there is a privileged unit with Delegate=yes. The most likely candidate is user@0.service, and so you could try preventing it from starting: systemctl mask user@0.service Note that systemd still works without group scheduling or any cgroup subsystems enabled in the kernel: $ grep GROUP .config CONFIG_CGROUPS=y # CONFIG_CGROUP_DEBUG is not set # CONFIG_CGROUP_FREEZER is not set # CONFIG_CGROUP_DEVICE is not set # CONFIG_CGROUP_CPUACCT is not set # CONFIG_CGROUP_HUGETLB is not set # CONFIG_CGROUP_PERF is not set # CONFIG_CGROUP_SCHED is not set # CONFIG_BLK_CGROUP is not set # CONFIG_SCHED_AUTOGROUP is not set # CONFIG_NETFILTER_XT_MATCH_CGROUP is not set # CONFIG_NETFILTER_XT_MATCH_DEVGROUP is not set # CONFIG_NET_CLS_CGROUP is not set # CONFIG_CGROUP_NET_PRIO is not set # CONFIG_CGROUP_NET_CLASSID is not set Michal -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: The ext3 way of journalling
On Wed, 9 Jan 2008 07:25:56 -0500 Theodore Tso <[EMAIL PROTECTED]> wrote: > On Wed, Jan 09, 2008 at 10:54:11AM +0100, Martin Schwidefsky wrote: > > On Jan 8, 2008 7:15 PM, Theodore Tso <[EMAIL PROTECTED]> wrote: > > > That will fix the this issue. The problem you are facing is that > > > you have your hardware clock set to ticking localtime, instead of > > > GMT. Windows ticks localtime, which is a mistake carried over > > > from the 1970's and MS-DOS. Ticking localtime has all sorts of > > > problems, among which is if you reboot around the transition > > > between Summer Time (or Daylight Savings Time, depending on your > > > contry) and normal time, the OS has no idea whether the DST > > > adjustment has been applied or not. > > > > Actually you can force Windows to accept a hardware clock in UTC: > > HKEY_LOCAL_MACHINE/SYSTEMCurrentControlSetControl/TimeZoneInformation/RealTimeIsUniversal > > Oh, so cool!!! Do you know off hand what version of Windows started > honoring that registry setting? > > And what do you set that registry value to? Just a boolean "true"? > > Now, how to convince Ubuntu to put this in their FAQ so I stop having > their ahhh, less than clueful dual-booting Windows users who happen to > live in Europe stop submitting bugs on this issue According to http://www.cl.cam.ac.uk/~mgk25/mswish/ut-rtc.html it's been there since Windows NT, but it is more or less broken in all newer versions. Michal -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kthread: always create the kernel threads with normal priority
On Mon, 7 Jan 2008 09:29:56 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: > On Mon, 7 Jan 2008 12:09:04 +0100 Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > > > > > This causes a practical problem. When a runaway real-time task > > > > is eating 100% CPU and we attempt to put the CPU offline, > > > > sometimes we block while waiting for the creation of the > > > > highest-priority "kstopmachine" thread. > > > > sched-devel.git has new mechanisms against runaway RT tasks. > > There's a new RLIMIT_RTTIME rlimit - if an RT task exceeds that > > rlimit then it is sent SIGXCPU. > > Is that "total RT CPU time" or "elapsed time since last schedule()"? > > If the former, it is not useful for this problem. It's "runtime since last sleep" so it is useful. I still think the kthread patch is good to have anyway. The user can have other reasons to change kthreadd's priority/cpumask. Michal -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kthread: always create the kernel threads with normal priority
On Mon, 7 Jan 2008 02:25:13 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: > On Mon, 7 Jan 2008 11:06:03 +0100 Michal Schmidt > <[EMAIL PROTECTED]> wrote: > > > On Sat, 22 Dec 2007 01:30:21 -0800 > > Andrew Morton <[EMAIL PROTECTED]> wrote: > > > > > On Mon, 17 Dec 2007 23:43:14 +0100 Michal Schmidt > > > <[EMAIL PROTECTED]> wrote: > > > > > > > kthreadd, the creator of other kernel threads, runs as a normal > > > > priority task. This is a potential for priority inversion when a > > > > task wants to spawn a high-priority kernel thread. A middle > > > > priority SCHED_FIFO task can block kthreadd's execution > > > > indefinitely and thus prevent the timely creation of the > > > > high-priority kernel thread. > > > > This causes a practical problem. When a runaway real-time task > > > > is eating 100% CPU and we attempt to put the CPU offline, > > > > sometimes we block while waiting for the creation of the > > > > highest-priority "kstopmachine" thread. > > > > > > > > The fix is to run kthreadd with the highest possible SCHED_FIFO > > > > priority. Its children must still run as slightly negatively > > > > reniced SCHED_NORMAL tasks. > > > > > > Did you hit this problem with the stock kernel, or have you been > > > working on other stuff? > > > > This was with RHEL5 and with current Fedora kernels. > > > > > A locked-up SCHED_FIFO process will cause kernel threads all > > > sorts of problems. You've hit one instance, but there will be > > > others. (pdflush stops working, for one). > > > > > > The general approach we've taken to this is "don't do that". > > > Yes, we could boost lots of kernel threads in the way which this > > > patch does but this actually takes control *away* from > > > userspace. Userspace no longer has the ability to guarantee > > > itself minimum possible latency without getting preempted by > > > kernel threads. > > > > > > And yes, giving userspace this minimum-latency capability does > > > imply that userspace has a responsibility to not 100% starve > > > kernel threads. It's a reasonable compromise, I think? > > > > You're right. We should not run kthreadd with SCHED_FIFO by default. > > But the user should be able to change it using chrt if he wants to > > avoid this particular problem. So how about this instead?: > > > > > > > > kthreadd, the creator of other kernel threads, runs as a normal > > priority task. This is a potential for priority inversion when a > > task wants to spawn a high-priority kernel thread. A middle > > priority SCHED_FIFO task can block kthreadd's execution > > indefinitely and thus prevent the timely creation of the > > high-priority kernel thread. > > > > This causes a practical problem. When a runaway real-time task is > > eating 100% CPU and we attempt to put the CPU offline, sometimes we > > block while waiting for the creation of the highest-priority > > "kstopmachine" thread. > > > > This could be solved by always running kthreadd with the highest > > possible SCHED_FIFO priority, but that would be undesirable policy > > decision in the kernel. kthreadd would cause unwanted latencies > > even for the realtime users who know what they're doing. > > > > Let's not make the decision for the user. Just allow the > > administrator to change kthreadd's priority safely if he chooses to > > do it. Ensure that the kernel threads are created with the usual > > nice level even if kthreadd's priority is changed from the default. > > > > Signed-off-by: Michal Schmidt <[EMAIL PROTECTED]> > > --- > > kernel/kthread.c | 11 +++ > > 1 files changed, 11 insertions(+), 0 deletions(-) > > > > diff --git a/kernel/kthread.c b/kernel/kthread.c > > index dcfe724..e832a85 100644 > > --- a/kernel/kthread.c > > +++ b/kernel/kthread.c > > @@ -94,10 +94,21 @@ static void create_kthread(struct > > kthread_create_info *create) if (pid < 0) { > > create->result = ERR_PTR(pid); > > } else { > > + struct sched_param param = { .sched_priority = 0 }; > > wait_for_completion(&create->started); > > read_lock(&tasklist_lock); > > create->result = find_task_by_pid(pid); > > read_unl
Re: [PATCH] kthread: always create the kernel threads with normal priority
On Mon, 7 Jan 2008 12:22:51 +0100 "Remy Bohmer" <[EMAIL PROTECTED]> wrote: > Hello Michal and Andrew, > > > Let's not make the decision for the user. Just allow the > > administrator to change kthreadd's priority safely if he chooses to > > do it. Ensure that the kernel threads are created with the usual > > nice level even if kthreadd's priority is changed from the default. > > Last year, I posted a patchset (that was meant for Preempt-RT at that > time) to be able to prioritise the interrupt-handler-threads (which > are kthreads) and softirq-threads from the kernel commandline. See > http://lkml.org/lkml/2007/12/19/208 > > Maybe we can find a way to use a similar mechanism as I used in my > patchset for the priorities of the remaining kthreads. > I do not like the way of forcing userland to change the priorities, > because that would require a userland with the chrt tool installed, > and that is not that practical for embedded systems (in which there > could be cases that there is no userland at all, or the init-process > is the whole embedded application). In that case an option to do it on > the kernel commandline is more practical. > > I propose this kernel cmd-line option: > kthread_pmap=somethread:50,otherthread:12,34 I see. kthreadd would look up the priority for itself and kthread_create would consult the map for all other kernel threads. That should work. Your sirq_pmap would not be needed anymore, as kthread_pmap could be used for softirq threads too, right? Michal -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] kthread: always create the kernel threads with normal priority
On Sat, 22 Dec 2007 01:30:21 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: > On Mon, 17 Dec 2007 23:43:14 +0100 Michal Schmidt > <[EMAIL PROTECTED]> wrote: > > > kthreadd, the creator of other kernel threads, runs as a normal > > priority task. This is a potential for priority inversion when a > > task wants to spawn a high-priority kernel thread. A middle priority > > SCHED_FIFO task can block kthreadd's execution indefinitely and thus > > prevent the timely creation of the high-priority kernel thread. > > > > This causes a practical problem. When a runaway real-time task is > > eating 100% CPU and we attempt to put the CPU offline, sometimes we > > block while waiting for the creation of the highest-priority > > "kstopmachine" thread. > > > > The fix is to run kthreadd with the highest possible SCHED_FIFO > > priority. Its children must still run as slightly negatively reniced > > SCHED_NORMAL tasks. > > Did you hit this problem with the stock kernel, or have you been > working on other stuff? This was with RHEL5 and with current Fedora kernels. > A locked-up SCHED_FIFO process will cause kernel threads all sorts of > problems. You've hit one instance, but there will be others. > (pdflush stops working, for one). > > The general approach we've taken to this is "don't do that". Yes, we > could boost lots of kernel threads in the way which this patch does > but this actually takes control *away* from userspace. Userspace no > longer has the ability to guarantee itself minimum possible latency > without getting preempted by kernel threads. > > And yes, giving userspace this minimum-latency capability does imply > that userspace has a responsibility to not 100% starve kernel > threads. It's a reasonable compromise, I think? You're right. We should not run kthreadd with SCHED_FIFO by default. But the user should be able to change it using chrt if he wants to avoid this particular problem. So how about this instead?: kthreadd, the creator of other kernel threads, runs as a normal priority task. This is a potential for priority inversion when a task wants to spawn a high-priority kernel thread. A middle priority SCHED_FIFO task can block kthreadd's execution indefinitely and thus prevent the timely creation of the high-priority kernel thread. This causes a practical problem. When a runaway real-time task is eating 100% CPU and we attempt to put the CPU offline, sometimes we block while waiting for the creation of the highest-priority "kstopmachine" thread. This could be solved by always running kthreadd with the highest possible SCHED_FIFO priority, but that would be undesirable policy decision in the kernel. kthreadd would cause unwanted latencies even for the realtime users who know what they're doing. Let's not make the decision for the user. Just allow the administrator to change kthreadd's priority safely if he chooses to do it. Ensure that the kernel threads are created with the usual nice level even if kthreadd's priority is changed from the default. Signed-off-by: Michal Schmidt <[EMAIL PROTECTED]> --- kernel/kthread.c | 11 +++ 1 files changed, 11 insertions(+), 0 deletions(-) diff --git a/kernel/kthread.c b/kernel/kthread.c index dcfe724..e832a85 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -94,10 +94,21 @@ static void create_kthread(struct kthread_create_info *create) if (pid < 0) { create->result = ERR_PTR(pid); } else { + struct sched_param param = { .sched_priority = 0 }; wait_for_completion(&create->started); read_lock(&tasklist_lock); create->result = find_task_by_pid(pid); read_unlock(&tasklist_lock); + /* +* root may want to change our (kthreadd's) priority to +* realtime to solve a corner case priority inversion problem +* (a realtime task consuming 100% CPU blocking the creation of +* kernel threads). The kernel thread should not inherit the +* higher priority. Let's always create it with the usual nice +* level. +*/ + sched_setscheduler(create->result, SCHED_NORMAL, ¶m); + set_user_nice(create->result, -5); } complete(&create->done); } -- 1.5.3.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] kthread: run kthreadd with max priority SCHED_FIFO
kthreadd, the creator of other kernel threads, runs as a normal priority task. This is a potential for priority inversion when a task wants to spawn a high-priority kernel thread. A middle priority SCHED_FIFO task can block kthreadd's execution indefinitely and thus prevent the timely creation of the high-priority kernel thread. This causes a practical problem. When a runaway real-time task is eating 100% CPU and we attempt to put the CPU offline, sometimes we block while waiting for the creation of the highest-priority "kstopmachine" thread. The fix is to run kthreadd with the highest possible SCHED_FIFO priority. Its children must still run as slightly negatively reniced SCHED_NORMAL tasks. Signed-off-by: Michal Schmidt <[EMAIL PROTECTED]> diff --git a/kernel/kthread.c b/kernel/kthread.c index dcfe724..a7ce932 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -94,10 +94,17 @@ static void create_kthread(struct kthread_create_info *create) if (pid < 0) { create->result = ERR_PTR(pid); } else { + struct sched_param param = { .sched_priority = 0 }; wait_for_completion(&create->started); read_lock(&tasklist_lock); create->result = find_task_by_pid(pid); read_unlock(&tasklist_lock); + /* +* We (kthreadd) run with SCHED_FIFO, but we don't want +* the kthreads we create to have it too by default. +*/ + sched_setscheduler(create->result, SCHED_NORMAL, ¶m); + set_user_nice(create->result, -5); } complete(&create->done); } @@ -217,11 +224,12 @@ EXPORT_SYMBOL(kthread_stop); int kthreadd(void *unused) { struct task_struct *tsk = current; + struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 }; /* Setup a clean context for our children to inherit. */ set_task_comm(tsk, "kthreadd"); ignore_signals(tsk); - set_user_nice(tsk, -5); + sched_setscheduler(tsk, SCHED_FIFO, ¶m); set_cpus_allowed(tsk, CPU_MASK_ALL); current->flags |= PF_NOFREEZE; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel panic - help!?
On Wed, 12 Dec 2007 07:24:36 -0700 Justin Banks <[EMAIL PROTECTED]> wrote: > > > (2.6.9-55.0.9.ELsmp) > -^^ > > It's really really old :) No, it's actually less than 3 months old kernel from RHEL-4 or CentOS. Michal -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] isapnp driver semaphore to mutex
Dne Mon, 03 Dec 2007 10:35:01 -0800 Daniel Walker <[EMAIL PROTECTED]> napsal(a): > Speaking of automating.. I created a little .vimrc add-on which helps > doing sem2mutex type changes. Here's the chunk I added, > > function Semtomutex( lo ) > exe '%s/down(&'.a:lo.')/mutex_lock\(\&'.a:lo.'\)/g' > exe '%s/down_trylock(&'.a:lo.')/mutex_trylock\(\&'.a:lo.'\)/g' >From the comment above mutex_trylock(): * NOTE: this function follows the spin_trylock() convention, so * it is negated to the down_trylock() return values! Be careful * about this when converting semaphore users to mutexes. Michal -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Use of mutex in interrupt context flawed/impossible, need advice.
On Thu, 22 Nov 2007 17:19:44 +0100 "Leon Woestenberg" <[EMAIL PROTECTED]> wrote: > I forgot to mention that I would like to be prepared for, and use the > -rt patch soon. I understand (maybe wrongly?) that semaphores are not > real-time pre-emptible, mutexes and spinlocks are. Semaphores are preemptible, but they don't do priority inheritance. Michal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] proc: loadavg reading race
The avenrun[] values are supposed to be protected by xtime_lock. loadavg_read_proc does not use it. Theoretically this may result in an occasional glitch when the value read from /proc/loadavg would be as much as 1<<11 times higher than it should be. Signed-off-by: Michal Schmidt <[EMAIL PROTECTED]> --- fs/proc/proc_misc.c | 11 --- 1 files changed, 8 insertions(+), 3 deletions(-) diff --git a/fs/proc/proc_misc.c b/fs/proc/proc_misc.c index e0d064e..10cc9ad 100644 --- a/fs/proc/proc_misc.c +++ b/fs/proc/proc_misc.c @@ -83,10 +83,15 @@ static int loadavg_read_proc(char *page, char **start, off_t off, { int a, b, c; int len; + unsigned long seq; + + do { + seq = read_seqbegin(&xtime_lock); + a = avenrun[0] + (FIXED_1/200); + b = avenrun[1] + (FIXED_1/200); + c = avenrun[2] + (FIXED_1/200); + } while (read_seqretry(&xtime_lock, seq)); - a = avenrun[0] + (FIXED_1/200); - b = avenrun[1] + (FIXED_1/200); - c = avenrun[2] + (FIXED_1/200); len = sprintf(page,"%d.%02d %d.%02d %d.%02d %ld/%d %d\n", LOAD_INT(a), LOAD_FRAC(a), LOAD_INT(b), LOAD_FRAC(b), -- 1.5.3.3 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Truecrypt in kernel ?
On Mon, 5 Nov 2007 20:42:39 -0500 "Zurk Tech" <[EMAIL PROTECTED]> wrote: > just wondering why the truecrypt module isnt in the mainline kernel ? > its the only cross platform encrypted disk solution out there and it > should be less of a chore to use it in linux...is there something > wrong with the truecrypt kernel driver ? Two reasons: The author hasn't sent patches. It looks to me the license is incompatible with the GPLv2. Michal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: get amount of "entropy" in /dev/random ?
Yakov Lerner wrote: > >From the userlevel, can I get an estimate of "amount of entropy" > in /dev/random, that is, the estimate of number of bytes > readable until it blocks ? Of course multiple processes > can read bytes and this would not be exact ... but still .. as an upper > boundary estimate ? > > Thanks > Yakov Try ioctl(fd, RNDGETENTCNT, &entropy_count) Michal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] pci: use pci=bfsort for HP DL385 G2, DL585 G2
Hello, HP ProLiant systems DL385 G2 and DL585 G2 need pci=bfsort to enumerate PCI devices in the expected order. (John, can you please confirm and ACK this?) Signed-off-by: Michal Schmidt <[EMAIL PROTECTED]> diff --git a/arch/i386/pci/common.c b/arch/i386/pci/common.c index ebc6f3c..8737c53 100644 --- a/arch/i386/pci/common.c +++ b/arch/i386/pci/common.c @@ -287,6 +287,22 @@ static struct dmi_system_id __devinitdata pciprobe_dmi_table[] = { DMI_MATCH(DMI_PRODUCT_NAME, "ProLiant BL685c G1"), }, }, + { + .callback = set_bf_sort, + .ident = "HP ProLiant DL385 G2", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "HP"), + DMI_MATCH(DMI_PRODUCT_NAME, "ProLiant DL385 G2"), + }, + }, + { + .callback = set_bf_sort, + .ident = "HP ProLiant DL585 G2", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "HP"), + DMI_MATCH(DMI_PRODUCT_NAME, "ProLiant DL585 G2"), + }, + }, {} }; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ppp_mppe: Don't put InterimKey on the stack
Matt Domsch skrev: > On Fri, Sep 21, 2007 at 04:08:09PM +0200, Michal Schmidt wrote: > >> Hello, >> >> The interrupt stack can be in the __START_KERNEL_map region in which >> virt_to_page will not work. This caused ppp_mppe to crash on CentOS 5 on >> x86_64 >> (http://bugs.centos.org/view.php?id=2076). >> >> The fix is to avoid copying the interim key. We can simply use it in its >> original place, which is kmalloc'd. >> > > Needs a Signed-off-by: line, but otherwise, looks good, and even saves > some stack space. Thanks for tracking this down. > > -Matt > Sorry about the forgotten sign-off. Here it is. Andrew, please apply. Signed-off-by: Michal Schmidt <[EMAIL PROTECTED]> --- drivers/net/ppp_mppe.c | 14 ++ 1 files changed, 6 insertions(+), 8 deletions(-) diff --git a/drivers/net/ppp_mppe.c b/drivers/net/ppp_mppe.c index f79cf87..c0b6d19 100644 --- a/drivers/net/ppp_mppe.c +++ b/drivers/net/ppp_mppe.c @@ -136,7 +136,7 @@ struct ppp_mppe_state { * Key Derivation, from RFC 3078, RFC 3079. * Equivalent to Get_Key() for MS-CHAP as described in RFC 3079. */ -static void get_new_key_from_sha(struct ppp_mppe_state * state, unsigned char *InterimKey) +static void get_new_key_from_sha(struct ppp_mppe_state * state) { struct hash_desc desc; struct scatterlist sg[4]; @@ -153,8 +153,6 @@ static void get_new_key_from_sha(struct ppp_mppe_state * state, unsigned char *I desc.flags = 0; crypto_hash_digest(&desc, sg, nbytes, state->sha1_digest); - - memcpy(InterimKey, state->sha1_digest, state->keylen); } /* @@ -163,21 +161,21 @@ static void get_new_key_from_sha(struct ppp_mppe_state * state, unsigned char *I */ static void mppe_rekey(struct ppp_mppe_state * state, int initial_key) { - unsigned char InterimKey[MPPE_MAX_KEY_LEN]; struct scatterlist sg_in[1], sg_out[1]; struct blkcipher_desc desc = { .tfm = state->arc4 }; - get_new_key_from_sha(state, InterimKey); + get_new_key_from_sha(state); if (!initial_key) { - crypto_blkcipher_setkey(state->arc4, InterimKey, state->keylen); - setup_sg(sg_in, InterimKey, state->keylen); + crypto_blkcipher_setkey(state->arc4, state->sha1_digest, + state->keylen); + setup_sg(sg_in, state->sha1_digest, state->keylen); setup_sg(sg_out, state->session_key, state->keylen); if (crypto_blkcipher_encrypt(&desc, sg_out, sg_in, state->keylen) != 0) { printk(KERN_WARNING "mppe_rekey: cipher_encrypt failed\n"); } } else { - memcpy(state->session_key, InterimKey, state->keylen); + memcpy(state->session_key, state->sha1_digest, state->keylen); } if (state->keylen == 8) { /* See RFC 3078 */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ppp_mppe: Don't put InterimKey on the stack
Hello, The interrupt stack can be in the __START_KERNEL_map region in which virt_to_page will not work. This caused ppp_mppe to crash on CentOS 5 on x86_64 (http://bugs.centos.org/view.php?id=2076). The fix is to avoid copying the interim key. We can simply use it in its original place, which is kmalloc'd. Michal --- drivers/net/ppp_mppe.c | 14 ++ 1 files changed, 6 insertions(+), 8 deletions(-) diff --git a/drivers/net/ppp_mppe.c b/drivers/net/ppp_mppe.c index f79cf87..c0b6d19 100644 --- a/drivers/net/ppp_mppe.c +++ b/drivers/net/ppp_mppe.c @@ -136,7 +136,7 @@ struct ppp_mppe_state { * Key Derivation, from RFC 3078, RFC 3079. * Equivalent to Get_Key() for MS-CHAP as described in RFC 3079. */ -static void get_new_key_from_sha(struct ppp_mppe_state * state, unsigned char *InterimKey) +static void get_new_key_from_sha(struct ppp_mppe_state * state) { struct hash_desc desc; struct scatterlist sg[4]; @@ -153,8 +153,6 @@ static void get_new_key_from_sha(struct ppp_mppe_state * state, unsigned char *I desc.flags = 0; crypto_hash_digest(&desc, sg, nbytes, state->sha1_digest); - - memcpy(InterimKey, state->sha1_digest, state->keylen); } /* @@ -163,21 +161,21 @@ static void get_new_key_from_sha(struct ppp_mppe_state * state, unsigned char *I */ static void mppe_rekey(struct ppp_mppe_state * state, int initial_key) { - unsigned char InterimKey[MPPE_MAX_KEY_LEN]; struct scatterlist sg_in[1], sg_out[1]; struct blkcipher_desc desc = { .tfm = state->arc4 }; - get_new_key_from_sha(state, InterimKey); + get_new_key_from_sha(state); if (!initial_key) { - crypto_blkcipher_setkey(state->arc4, InterimKey, state->keylen); - setup_sg(sg_in, InterimKey, state->keylen); + crypto_blkcipher_setkey(state->arc4, state->sha1_digest, + state->keylen); + setup_sg(sg_in, state->sha1_digest, state->keylen); setup_sg(sg_out, state->session_key, state->keylen); if (crypto_blkcipher_encrypt(&desc, sg_out, sg_in, state->keylen) != 0) { printk(KERN_WARNING "mppe_rekey: cipher_encrypt failed\n"); } } else { - memcpy(state->session_key, InterimKey, state->keylen); + memcpy(state->session_key, state->sha1_digest, state->keylen); } if (state->keylen == 8) { /* See RFC 3078 */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.21] Return available first timeslice to the creator, not parent
Vitaly Mayatskikh skrev: Short-living process returns its timeslice to the parent, this affects process that creates a lot of such short-living threads, because its not a parent for new threads. I don't see the point of sending patches for old Linux versions such as 2.6.21, unless it's something applicable to the -stable tree. Do recent kernels with CFS have the same problem? Patch fixes this issue and doesn't break kabi as does the patch from reporter: http://lkml.org/lkml/2007/4/7/21 There's no kabi. Michal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nanosleep() accuracy
GolovaSteek wrote: > 2007/8/17, Michal Schmidt <[EMAIL PROTECTED]>: > >> GolovaSteek skrev: >> >>> Hello! >>> I need use sleep with accurat timing. >>> I use 2.6.21 with rt-prempt patch. >>> with enabled rt_preempt, dyn_ticks, and local_apic >>> But >>> >>> req.tv_nsec = 30; >>> req.tv_sec = 0; >>> nanosleep(&req,NULL) >>> >>> make pause around 310-330 microseconds. >>> >> How do you measure this? >> If you want to have something done every 300 microseconds, you must not >> sleep for 300 microseconds in each iteration, because you'd accumulate >> errors. Use a periodic timer or use the current time to compute how long >> to sleep in each iteration. Take a look how cyclictest does it. >> > > no. I just want my programm go to sleep sometimes and wake up in correct time. > What does your program do that it has such a strict requirement on the exact length of sleeping? >>> I tried to understend how work nanosleep(), but it not depends from >>> jiffies and from smp_apic_timer_interrupt. >>> >>> When can accuracy be lost? >>> And how are process waked up? >>> >>> >>> GolovaSteek >>> >> Don't forget the process will always have non-zero wakeup latency. It >> takes some time to process an interrupt, wakeup the process and schedule >> it to run on the CPU. 10-30 microseconds is not unreasonable. >> > > But 2 operations can be done in 10 microseconds? > and why is there that inconstancy? Why sametimes 10 and sometimes 30? > In which points of implementation it happens? > > GolovaSteek > If a jitter of 20 microseconds is unacceptable for your application, don't use PC hardware. Consider using a microcontroller. Michal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nanosleep() accuracy
GolovaSteek skrev: Hello! I need use sleep with accurat timing. I use 2.6.21 with rt-prempt patch. with enabled rt_preempt, dyn_ticks, and local_apic But req.tv_nsec = 30; req.tv_sec = 0; nanosleep(&req,NULL) make pause around 310-330 microseconds. How do you measure this? If you want to have something done every 300 microseconds, you must not sleep for 300 microseconds in each iteration, because you'd accumulate errors. Use a periodic timer or use the current time to compute how long to sleep in each iteration. Take a look how cyclictest does it. I tried to understend how work nanosleep(), but it not depends from jiffies and from smp_apic_timer_interrupt. When can accuracy be lost? And how are process waked up? GolovaSteek Don't forget the process will always have non-zero wakeup latency. It takes some time to process an interrupt, wakeup the process and schedule it to run on the CPU. 10-30 microseconds is not unreasonable. Michal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] destroy_workqueue() can livelock
Oleg Nesterov wrote: > Pointed out by Michal Schmidt <[EMAIL PROTECTED]>. > > The bug was introduced in 2.6.22 by me. > > cleanup_workqueue_thread() does flush_cpu_workqueue(cwq) in a loop until > ->worklist becomes empty. This is live-lockable, a re-niced caller can > get CPU after wake_up() and insert a new barrier before the lower-priority > cwq->thread has a chance to clear ->current_work. > > Change cleanup_workqueue_thread() to do flush_cpu_workqueue(cwq) only once. > We can rely on the fact that run_workqueue() won't return until it flushes > all works. So it is safe to call kthread_stop() after that, the "should stop" > request won't be noticed until run_workqueue() returns. > > Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]> I confirm the patch fixes the bug I was seeing. Michal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
destroy_workqueue can livelock
Hi, While using SystemTap I noticed an interesting situation. When my stap probe was exiting, there was a several seconds long delay, during which the CPU was 100% loaded. I narrowed the problem down to destroy_workqueue. The attached module is a minimized testcase. To reproduce it, load the module and then try to rmmod it from a higher priority process: nice -n -10 rmmod wqtest.ko # that's how SystemTap's staprun behaves or: chrt -f 90 rmmod wqtest.ko # this may be more reliably reproducible I tested it (with "nice") on Linux 2.6.22. The rmmod process took about 55% CPU, the workqueue thread consumed the rest. This situation can last for minutes. As soon as the rmmod process is reniced to 0, the workqueue is destroyed successfully and the module is unloaded. Here's what happens in detail: When rmmod executes cancel_rearming_delayed_workqueue() -> wait_on_work() -> wait_on_cpu_work(), the work is the current_work on the workqueue (it's in ssleep(1)). So wait_on_cpu_work() inserts a wq_barrier on the workqueue and waits for the completion. As soon as wq_barrier_func signals the completion, it is most likely preempted by the rmmod process. At this moment, the worklist is already empty, but cwq->current_work still points to the barrier. run_workqueue() didn't get to reset it to NULL yet. Now rmmod calls destroy_workqueue() -> cleanup_workqueue_thread() -> flush_cpu_workqueue(). Because cwq->current_work!=NULL it decides to insert another wq_barrier and wait for it to complete. But cwq->current_work will never be reset to NULL, so cleanup_workqueue_thread() keeps trying flush_cpu_workqueue() indefinitely, inserting wq_barriers and waiting for them. If rmmod's priority is lowered, run_workqueue() will not be preempted by it and manages to reset cwq->current_work. This ends the livelock. Can this be fixed? Or is it just a case of "Don't do that then!"? ("that" meaning destroying workqueues from negatively reniced processes) Michal #include #include #include #include MODULE_LICENSE("GPL"); MODULE_AUTHOR("Michal Schmidt"); static void wq_func(struct work_struct *w); static DECLARE_DELAYED_WORK(wq_work, wq_func); static struct workqueue_struct *wq; static DECLARE_WAIT_QUEUE_HEAD(ctl_wq); static void wq_func(struct work_struct *w) { /* * So that this work is most likely cwq->current_work * when destroy_workqueue comes... */ ssleep(1); queue_delayed_work(wq, &wq_work, HZ/100); } static int wqtest_start(void) { wq = create_workqueue("wqtest"); if (!wq) return -1; queue_delayed_work(wq, &wq_work, HZ/100); return 0; } static void wqtest_stop(void) { printk(KERN_CRIT "wqtest: cancelling the work\n"); cancel_rearming_delayed_work(&wq_work); printk(KERN_CRIT "wqtest: destroying the wq\n"); destroy_workqueue(wq); printk(KERN_CRIT "wqtest: done\n"); } module_init(wqtest_start); module_exit(wqtest_stop);
Re: Need help making sense of IRQ API
LOL ER wrote: > Hello, > I've been trying to make sense of how the kernel (on an i386) calls > __do_IRQ() from do_IRQ() for the past few days to no avail. [...] Since i386 was switched to the generic-IRQ architecture (see "Linux generic IRQ handling" in Documentation/Docbook) it does not use __do_IRQ(). common_interrupt (in assembler) calls do_IRQ(), which calls desc->handle_irq() that is usually one of: handle_fasteoi_irq() handle_level_irq() handle_edge_irq() Michal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -rt] irq nobody cared workaround for i386
Steven Rostedt wrote: > Michal Schmidt wrote: > >> I came to the conclusion that the IO-APICs which need the fix for the >> nobody cared bug don't have the issue ack_ioapic_quirk_irq is designed >> to work-around. It should be safe simply to use the normal >> ack_ioapic_irq as the .eoi method in pcix_ioapic_chip. >> So this is the port of Steven's fix for the nobody cared bug to i386. It >> works fine on IBM LS21 I have access to. >> >> > You want to make that "apic > 0". Note the spacing. If it breaks > 80 characters, then simply put it to a new line. > > [...] > ACK > > -- Steve > OK, I fixed the spacing in both occurences. Signed-off-by: Michal Schmidt <[EMAIL PROTECTED]> --- arch/i386/kernel/io_apic.c.orig 2007-06-19 08:40:05.0 -0400 +++ arch/i386/kernel/io_apic.c 2007-06-21 06:51:16.0 -0400 @@ -261,6 +261,18 @@ static void __unmask_IO_APIC_irq (unsign __modify_IO_APIC_irq(irq, 0, 0x0001); } +/* trigger = 0 (edge mode) */ +static void __pcix_mask_IO_APIC_irq (unsigned int irq) +{ + __modify_IO_APIC_irq(irq, 0, 0x8000); +} + +/* mask = 0, trigger = 1 (level mode) */ +static void __pcix_unmask_IO_APIC_irq (unsigned int irq) +{ + __modify_IO_APIC_irq(irq, 0x8000, 0x0001); +} + static void mask_IO_APIC_irq (unsigned int irq) { unsigned long flags; @@ -279,6 +291,24 @@ static void unmask_IO_APIC_irq (unsigned spin_unlock_irqrestore(&ioapic_lock, flags); } +static void pcix_mask_IO_APIC_irq (unsigned int irq) +{ + unsigned long flags; + + spin_lock_irqsave(&ioapic_lock, flags); + __pcix_mask_IO_APIC_irq(irq); + spin_unlock_irqrestore(&ioapic_lock, flags); +} + +static void pcix_unmask_IO_APIC_irq (unsigned int irq) +{ + unsigned long flags; + + spin_lock_irqsave(&ioapic_lock, flags); + __pcix_unmask_IO_APIC_irq(irq); + spin_unlock_irqrestore(&ioapic_lock, flags); +} + static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin) { struct IO_APIC_route_entry entry; @@ -1257,22 +1287,27 @@ static int assign_irq_vector(int irq) return vector; } + static struct irq_chip ioapic_chip; +static struct irq_chip pcix_ioapic_chip; #define IOAPIC_AUTO-1 #define IOAPIC_EDGE0 #define IOAPIC_LEVEL 1 -static void ioapic_register_intr(int irq, int vector, unsigned long trigger) +static void ioapic_register_intr(int irq, int vector, unsigned long trigger, +int pcix) { + struct irq_chip *chip = pcix ? &pcix_ioapic_chip : &ioapic_chip; + if ((trigger == IOAPIC_AUTO && IO_APIC_irq_trigger(irq)) || trigger == IOAPIC_LEVEL) - set_irq_chip_and_handler_name(irq, &ioapic_chip, -handle_fasteoi_irq, "fasteoi"); - else { - set_irq_chip_and_handler_name(irq, &ioapic_chip, -handle_edge_irq, "edge"); - } + set_irq_chip_and_handler_name(irq, chip, handle_fasteoi_irq, + pcix ? "pcix-fasteoi" : "fasteoi"); + else + set_irq_chip_and_handler_name(irq, chip, handle_edge_irq, + pcix ? "pcix-edge" : "edge"); + set_intr_gate(vector, interrupt[irq]); } @@ -1336,7 +1371,8 @@ static void __init setup_IO_APIC_irqs(vo if (IO_APIC_IRQ(irq)) { vector = assign_irq_vector(irq); entry.vector = vector; - ioapic_register_intr(irq, vector, IOAPIC_AUTO); + ioapic_register_intr(irq, vector, IOAPIC_AUTO, +apic > 0); if (!apic && (irq < 16)) disable_8259A_irq(irq); @@ -2058,6 +2094,18 @@ static struct irq_chip ioapic_chip __rea .retrigger = ioapic_retrigger_irq, }; +static struct irq_chip pcix_ioapic_chip __read_mostly = { + .name = "IO-APIC", + .startup= startup_ioapic_irq, + .mask = pcix_mask_IO_APIC_irq, + .unmask = pcix_unmask_IO_APIC_irq, + .ack= ack_ioapic_irq, + .eoi= ack_ioapic_irq, +#ifdef CONFIG_SMP + .set_affinity = set_ioapic_affinity_irq, +#endif + .retrigger = ioapic_retrigger_irq, +}; static inline void init_IO_APIC_traps(void) { @@ -2858,7 +2906,7 @@ int io_apic_set_pci_routing (int ioapic, mp_ioapics[ioapic].mpc_apicid, pin, entry.vector, irq, edge_level, active_high_low); - ioapic_register_intr(irq, entry.vec
Re: [PATCH -rt] irq nobody cared workaround for i386
Michal Schmidt wrote: > Steven Rostedt wrote: > >> This is the final "design" for the nobody cared bug. For all IO-APICS >> other than the first one (the chained IO-APICS) we use the PCIX version >> of the mask and unmask interrupt routines. This changes the interrupt >> from level to edge for mask and edge to level for unmask. This keeps the >> PCI-E from thinking it's in legacy mode and assert an old fashion INT# >> interrupt which might spread to other interrupts. >> >> >> > > Here's a port of the workaround to i386. I tested it successfully on IBM > LS21. > Notice I had to disable the quirk handling in ack_ioapic_quirk_irq. The > code path was triggering on LS21 and because it plays with the Interrupt > Mask bit, it produced the doubled interrupts again. I don't like it and > I need to think about a solution which would handle both quirks correctly. > I came to the conclusion that the IO-APICs which need the fix for the nobody cared bug don't have the issue ack_ioapic_quirk_irq is designed to work-around. It should be safe simply to use the normal ack_ioapic_irq as the .eoi method in pcix_ioapic_chip. So this is the port of Steven's fix for the nobody cared bug to i386. It works fine on IBM LS21 I have access to. Signed-off-by: Michal Schmidt <[EMAIL PROTECTED]> --- arch/i386/kernel/io_apic.c.orig 2007-06-19 08:40:05.0 -0400 +++ arch/i386/kernel/io_apic.c 2007-06-20 09:03:55.0 -0400 @@ -261,6 +261,18 @@ static void __unmask_IO_APIC_irq (unsign __modify_IO_APIC_irq(irq, 0, 0x0001); } +/* trigger = 0 (edge mode) */ +static void __pcix_mask_IO_APIC_irq (unsigned int irq) +{ + __modify_IO_APIC_irq(irq, 0, 0x8000); +} + +/* mask = 0, trigger = 1 (level mode) */ +static void __pcix_unmask_IO_APIC_irq (unsigned int irq) +{ + __modify_IO_APIC_irq(irq, 0x8000, 0x0001); +} + static void mask_IO_APIC_irq (unsigned int irq) { unsigned long flags; @@ -279,6 +291,24 @@ static void unmask_IO_APIC_irq (unsigned spin_unlock_irqrestore(&ioapic_lock, flags); } +static void pcix_mask_IO_APIC_irq (unsigned int irq) +{ + unsigned long flags; + + spin_lock_irqsave(&ioapic_lock, flags); + __pcix_mask_IO_APIC_irq(irq); + spin_unlock_irqrestore(&ioapic_lock, flags); +} + +static void pcix_unmask_IO_APIC_irq (unsigned int irq) +{ + unsigned long flags; + + spin_lock_irqsave(&ioapic_lock, flags); + __pcix_unmask_IO_APIC_irq(irq); + spin_unlock_irqrestore(&ioapic_lock, flags); +} + static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin) { struct IO_APIC_route_entry entry; @@ -1257,22 +1287,27 @@ static int assign_irq_vector(int irq) return vector; } + static struct irq_chip ioapic_chip; +static struct irq_chip pcix_ioapic_chip; #define IOAPIC_AUTO-1 #define IOAPIC_EDGE0 #define IOAPIC_LEVEL 1 -static void ioapic_register_intr(int irq, int vector, unsigned long trigger) +static void ioapic_register_intr(int irq, int vector, unsigned long trigger, +int pcix) { + struct irq_chip *chip = pcix ? &pcix_ioapic_chip : &ioapic_chip; + if ((trigger == IOAPIC_AUTO && IO_APIC_irq_trigger(irq)) || trigger == IOAPIC_LEVEL) - set_irq_chip_and_handler_name(irq, &ioapic_chip, -handle_fasteoi_irq, "fasteoi"); - else { - set_irq_chip_and_handler_name(irq, &ioapic_chip, -handle_edge_irq, "edge"); - } + set_irq_chip_and_handler_name(irq, chip, handle_fasteoi_irq, + pcix ? "pcix-fasteoi" : "fasteoi"); + else + set_irq_chip_and_handler_name(irq, chip, handle_edge_irq, + pcix ? "pcix-edge" : "edge"); + set_intr_gate(vector, interrupt[irq]); } @@ -1336,7 +1371,7 @@ static void __init setup_IO_APIC_irqs(vo if (IO_APIC_IRQ(irq)) { vector = assign_irq_vector(irq); entry.vector = vector; - ioapic_register_intr(irq, vector, IOAPIC_AUTO); + ioapic_register_intr(irq, vector, IOAPIC_AUTO, apic>0); if (!apic && (irq < 16)) disable_8259A_irq(irq); @@ -2058,6 +2093,18 @@ static struct irq_chip ioapic_chip __rea .retrigger = ioapic_retrigger_irq, }; +static struct irq_chip pcix_ioapic_chip __read_mostly = { + .name = "IO-APIC", + .startup= startup_ioapic_irq, +
[PATCH -rt] irq nobody cared workaround for i386
Steven Rostedt wrote: > This is the final "design" for the nobody cared bug. For all IO-APICS > other than the first one (the chained IO-APICS) we use the PCIX version > of the mask and unmask interrupt routines. This changes the interrupt > from level to edge for mask and edge to level for unmask. This keeps the > PCI-E from thinking it's in legacy mode and assert an old fashion INT# > interrupt which might spread to other interrupts. > > Here's a port of the workaround to i386. I tested it successfully on IBM LS21. Notice I had to disable the quirk handling in ack_ioapic_quirk_irq. The code path was triggering on LS21 and because it plays with the Interrupt Mask bit, it produced the doubled interrupts again. I don't like it and I need to think about a solution which would handle both quirks correctly. Michal --- arch/i386/kernel/io_apic.c.orig 2007-06-19 08:40:05.0 -0400 +++ arch/i386/kernel/io_apic.c 2007-06-19 08:58:00.0 -0400 @@ -261,6 +261,18 @@ static void __unmask_IO_APIC_irq (unsign __modify_IO_APIC_irq(irq, 0, 0x0001); } +/* trigger = 0 (edge mode) */ +static void __pcix_mask_IO_APIC_irq (unsigned int irq) +{ + __modify_IO_APIC_irq(irq, 0, 0x8000); +} + +/* mask = 0, trigger = 1 (level mode) */ +static void __pcix_unmask_IO_APIC_irq (unsigned int irq) +{ + __modify_IO_APIC_irq(irq, 0x8000, 0x0001); +} + static void mask_IO_APIC_irq (unsigned int irq) { unsigned long flags; @@ -279,6 +291,24 @@ static void unmask_IO_APIC_irq (unsigned spin_unlock_irqrestore(&ioapic_lock, flags); } +static void pcix_mask_IO_APIC_irq (unsigned int irq) +{ + unsigned long flags; + + spin_lock_irqsave(&ioapic_lock, flags); + __pcix_mask_IO_APIC_irq(irq); + spin_unlock_irqrestore(&ioapic_lock, flags); +} + +static void pcix_unmask_IO_APIC_irq (unsigned int irq) +{ + unsigned long flags; + + spin_lock_irqsave(&ioapic_lock, flags); + __pcix_unmask_IO_APIC_irq(irq); + spin_unlock_irqrestore(&ioapic_lock, flags); +} + static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin) { struct IO_APIC_route_entry entry; @@ -1257,22 +1287,27 @@ static int assign_irq_vector(int irq) return vector; } + static struct irq_chip ioapic_chip; +static struct irq_chip pcix_ioapic_chip; #define IOAPIC_AUTO-1 #define IOAPIC_EDGE0 #define IOAPIC_LEVEL 1 -static void ioapic_register_intr(int irq, int vector, unsigned long trigger) +static void ioapic_register_intr(int irq, int vector, unsigned long trigger, +int pcix) { + struct irq_chip *chip = pcix ? &pcix_ioapic_chip : &ioapic_chip; + if ((trigger == IOAPIC_AUTO && IO_APIC_irq_trigger(irq)) || trigger == IOAPIC_LEVEL) - set_irq_chip_and_handler_name(irq, &ioapic_chip, -handle_fasteoi_irq, "fasteoi"); - else { - set_irq_chip_and_handler_name(irq, &ioapic_chip, -handle_edge_irq, "edge"); - } + set_irq_chip_and_handler_name(irq, chip, handle_fasteoi_irq, + pcix ? "pcix-fasteoi" : "fasteoi"); + else + set_irq_chip_and_handler_name(irq, chip, handle_edge_irq, + pcix ? "pcix-edge" : "edge"); + set_intr_gate(vector, interrupt[irq]); } @@ -1336,7 +1371,7 @@ static void __init setup_IO_APIC_irqs(vo if (IO_APIC_IRQ(irq)) { vector = assign_irq_vector(irq); entry.vector = vector; - ioapic_register_intr(irq, vector, IOAPIC_AUTO); + ioapic_register_intr(irq, vector, IOAPIC_AUTO, apic>0); if (!apic && (irq < 16)) disable_8259A_irq(irq); @@ -2027,6 +2062,7 @@ static void ack_ioapic_quirk_irq(unsigne ack_APIC_irq(); +#if 0 if (!(v & (1 << (i & 0x1f { atomic_inc(&irq_mis_count); spin_lock(&ioapic_lock); @@ -2036,6 +2072,7 @@ static void ack_ioapic_quirk_irq(unsigne __modify_IO_APIC_irq(irq, 0x8000, 0x0001); spin_unlock(&ioapic_lock); } +#endif } static int ioapic_retrigger_irq(unsigned int irq) @@ -2058,6 +2095,18 @@ static struct irq_chip ioapic_chip __rea .retrigger = ioapic_retrigger_irq, }; +static struct irq_chip pcix_ioapic_chip __read_mostly = { + .name = "IO-APIC", + .startup= startup_ioapic_irq, + .mask = pcix_mask_IO_APIC_irq, + .unmask = pcix_unmask_IO_APIC_irq, + .ack= ack_ioapic_irq, + .eoi= ack_ioapic_quirk_irq, +#ifdef CONFIG_SMP + .set_affinity = set_ioapic_affinity_irq, +#endif +
Re: ZFS with Linux: An Open Plea
linux-os (Dick Johnson) skrev: > if you never look at somebody else's' > implementation details, you certainly should not be violating a patent. Oh, it would be a beautiful world in which this was true! Michal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 58/59] sysctl: Reimplement the sysctl proc support
Ingo Molnar wrote: > * Alexey Dobriyan <[EMAIL PROTECTED]> wrote: > >> Use register_sysctl_table() for sysctls. >> > > yes - i just wanted to point out the incompatibility and subtle breakage > that this change caused. I'll now have to convert the current code over > to sysctl_table, which isnt that hard but not trivial either, and i > certainly could make use that time for other purposes. > > Ingo > How about this? It works for me. Signed-off-by: Michal Schmidt <[EMAIL PROTECTED]> diff --git a/kernel/latency_trace.c b/kernel/latency_trace.c index e07bb95..a13d001 100644 --- a/kernel/latency_trace.c +++ b/kernel/latency_trace.c @@ -19,7 +19,7 @@ #include #include #include -#include +#include #include #include #include @@ -2661,66 +2661,94 @@ void print_traces(struct task_struct *task) } #endif -static int preempt_read_proc(char *page, char **start, off_t off, -int count, int *eof, void *data) +#if defined(CONFIG_WAKEUP_TIMING) || defined(CONFIG_EVENT_TRACE) + +static int preempt_proc_handler(ctl_table *table, int write, struct file *filp, + void __user *buffer, size_t *lenp, loff_t *ppos) { - cycle_t *max = data; +#define TMPBUFLEN 21 + char buf[TMPBUFLEN]; + size_t left = *lenp; + cycle_t *max = table->data; - return sprintf(page, "%ld\n", cycles_to_usecs(*max)); -} + if (!table->data || table->maxlen!=sizeof(cycles_t) || !*lenp || + (*ppos && !write)) { + *lenp = 0; + return 0; + } -static int preempt_write_proc(struct file *file, const char __user *buffer, - unsigned long count, void *data) -{ - unsigned int c, done = 0, val, sum = 0; - cycle_t *max = data; + if (!write) { + int len; - while (count) { - if (get_user(c, buffer)) - return -EFAULT; - val = c - '0'; - buffer++; - done++; - count--; - if (c == 0 || c == '\n') - break; - if (val > 9) + len = snprintf(buf, TMPBUFLEN, "%ld\n", cycles_to_usecs(*max)); + if (len >= TMPBUFLEN) return -EINVAL; - sum *= 10; - sum += val; + if (len > left) + len = left; + if (copy_to_user(buffer, buf, len)) + return -EFAULT; + left -= len; + } else { + unsigned int c, val, sum = 0; + + while (left) { + if (get_user(c, (char __user *)buffer)) + return -EFAULT; + val = c - '0'; + buffer++; + left--; + if (c == 0 || c == '\n') + break; + if (val > 9) + return -EINVAL; + sum *= 10; + sum += val; + } + *max = usecs_to_cycles(sum); } - *max = usecs_to_cycles(sum); - return done; + + *lenp -= left; + *ppos += *lenp; + return 0; } -#if defined(CONFIG_WAKEUP_TIMING) || defined(CONFIG_EVENT_TRACE) +static ctl_table preempt_latency_table[] = { + { + .ctl_name = CTL_UNNUMBERED, + .procname = "preempt_max_latency", + .data = &preempt_max_latency, + .maxlen = sizeof(cycles_t), + .mode = 0644, + .proc_handler = &preempt_proc_handler, + }, +#ifdef CONFIG_EVENT_TRACE + { + .ctl_name = CTL_UNNUMBERED, + .procname = "preempt_thresh", + .data = &preempt_thresh, + .maxlen = sizeof(cycles_t), + .mode = 0644, + .proc_handler = &preempt_proc_handler, + }, +#endif + { .ctl_name = 0 } +}; -#definePROCNAME_PML"sys/kernel/preempt_max_latency" -#define PROCNAME_PT"sys/kernel/preempt_thresh" +static ctl_table kernel_root[] = { + { + .ctl_name = CTL_KERN, + .procname = "kernel", + .mode = 0555, + .child = preempt_latency_table, + }, + { .ctl_name = 0 } +}; + +static struct ctl_table_header *sysctl_header; static __init int latency_fs_init(void) { - struct proc_dir_entry *entry; - - if (!(entry = create_proc_entry(PROCNAME_PML, 0644, NULL)
[PATCH -rt] fix preempt count underflow in user_trace_stop
When playing with trace_user_trigger_irq in order to trace IRQ->userspace latencies, I encountered a bug in the latency tracer. If I have wakeup_timing enabled and attempt to stop the trace in my userspace program, the system crashes. This is caused by an unbalanced preempt_enable() which underflows the preempt count. Signed-off-by: Michal Schmidt <[EMAIL PROTECTED]> diff --git a/kernel/latency_trace.c b/kernel/latency_trace.c index e07bb95..29dfb79 100644 --- a/kernel/latency_trace.c +++ b/kernel/latency_trace.c @@ -2396,7 +2396,6 @@ long user_trace_stop(void) if (current != sch.task) { __raw_spin_unlock(&sch.trace_lock); local_irq_restore(flags); - preempt_enable(); return -EINVAL; } sch.task = NULL; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -rt] airo: threaded IRQ handler sleeps forever
The airo driver tries to avoid excessive latencies when issuing commands to the card by calling schedule() after several retries. But issuecommand() can be run from an interrupt handler. The function is careful enough to check with in_atomic() if it is safe to call schedule(). This check breaks when the interrupt handler is threaded, because then in_atomic() is always false there. The handler is run as TASK_INTERRUPTIBLE, so schedule() takes it off the runqueue and it never wakes up again. Here's an obvious fix - simply don't call schedule() when using preemptible hardirqs. An improved solution might be to identify the commands that take so long to issue and avoid sending them from the interrupt handler. In my testing there was only one such command: CMD_ACCESS. I need to investigate if it's always possible to delay it to airo's kthread. Signed-off-by: Michal Schmidt <[EMAIL PROTECTED]> diff --git a/drivers/net/wireless/airo.c b/drivers/net/wireless/airo.c index 44a2270..98014c0 100644 --- a/drivers/net/wireless/airo.c +++ b/drivers/net/wireless/airo.c @@ -3938,8 +3938,10 @@ static u16 issuecommand(struct airo_info *ai, Cmd *pCmd, Resp *pRsp) { if ((IN4500(ai, COMMAND)) == pCmd->cmd) // PC4500 didn't notice command, try again OUT4500(ai, COMMAND, pCmd->cmd); +#ifndef CONFIG_PREEMPT_HARDIRQS if (!in_atomic() && (max_tries & 255) == 0) schedule(); +#endif } if ( max_tries == -1 ) { - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix compilation of drivers with -O0
[that still wasn't right, here's for the 3rd and final time.] It is sometimes useful to compile individual drivers with optimization disabled for easier debugging. Currently drivers which use htonl() and similar functions don't compile with -O0. This patch fixes it. It also removes obsolete and misleading comments. This header is not for userspace, so we don't have to care about strange programs these comments mention. Signed-off-by: Michal Schmidt <[EMAIL PROTECTED]> diff --git a/include/linux/byteorder/generic.h b/include/linux/byteorder/generic.h index e86e4a9..3dc715b 100644 --- a/include/linux/byteorder/generic.h +++ b/include/linux/byteorder/generic.h @@ -124,19 +124,8 @@ #define be32_to_cpus __be32_to_cpus #define cpu_to_be16s __cpu_to_be16s #define be16_to_cpus __be16_to_cpus -#endif - -#if defined(__KERNEL__) /* - * Handle ntohl and suches. These have various compatibility - * issues - like we want to give the prototype even though we - * also have a macro for them in case some strange program - * wants to take the address of the thing or something.. - * - * Note that these used to return a "long" in libc5, even though - * long is often 64-bit these days.. Thus the casts. - * * They have to be macros in order to do the constant folding * correctly - if the argument passed into a inline function * it is no longer constant according to gcc.. @@ -147,17 +136,6 @@ #undef htonl #undef htons -/* - * Do the prototypes. Somebody might want to take the - * address or some such sick thing.. - */ -extern __u32 ntohl(__be32); -extern __be32 htonl(__u32); -extern __u16 ntohs(__be16); -extern __be16 htons(__u16); - -#if defined(__GNUC__) && defined(__OPTIMIZE__) - #define ___htonl(x) __cpu_to_be32(x) #define ___htons(x) __cpu_to_be16(x) #define ___ntohl(x) __be32_to_cpu(x) @@ -168,9 +146,6 @@ extern __be16 htons(__u16); #define htons(x) ___htons(x) #define ntohs(x) ___ntohs(x) -#endif /* OPTIMIZE */ - #endif /* KERNEL */ - #endif /* _LINUX_BYTEORDER_GENERIC_H */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix compilation of drivers with -O0
[Sorry, the patch was corrupted by the mailer. Hopefully it's ok this time.] It is sometimes useful to compile individual drivers with optimization disabled for easier debugging. Currently drivers which use htonl() and similar functions don't compile with -O0. This patch fixes it. It also removes obsolete and misleading comments. This header is not for userspace, so we don't have to care about strange programs these comments mention. Signed-off-by: Michal Schmidt <[EMAIL PROTECTED]> diff --git a/include/linux/byteorder/generic.h b/include/linux/byteorder/generic.h index e86e4a9..3dc715b 100644 --- a/include/linux/byteorder/generic.h +++ b/include/linux/byteorder/generic.h @@ -124,19 +124,8 @@ #define be32_to_cpus __be32_to_cpus #define cpu_to_be16s __cpu_to_be16s #define be16_to_cpus __be16_to_cpus -#endif - -#if defined(__KERNEL__) /* - * Handle ntohl and suches. These have various compatibility - * issues - like we want to give the prototype even though we - * also have a macro for them in case some strange program - * wants to take the address of the thing or something.. - * - * Note that these used to return a "long" in libc5, even though - * long is often 64-bit these days.. Thus the casts. - * * They have to be macros in order to do the constant folding * correctly - if the argument passed into a inline function * it is no longer constant according to gcc.. @@ -147,17 +136,6 @@ #undef htonl #undef htons -/* - * Do the prototypes. Somebody might want to take the - * address or some such sick thing.. - */ -extern __u32ntohl(__be32); -extern __be32htonl(__u32); -extern __u16ntohs(__be16); -extern __be16htons(__u16); - -#if defined(__GNUC__) && defined(__OPTIMIZE__) - #define ___htonl(x) __cpu_to_be32(x) #define ___htons(x) __cpu_to_be16(x) #define ___ntohl(x) __be32_to_cpu(x) @@ -168,9 +146,6 @@ extern __be16htons(__u16); #define htons(x) ___htons(x) #define ntohs(x) ___ntohs(x) -#endif /* OPTIMIZE */ - #endif /* KERNEL */ - #endif /* _LINUX_BYTEORDER_GENERIC_H */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Fix compilation of drivers with -O0
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 It is sometimes useful to compile individual drivers with optimization disabled for easier debugging. Currently drivers which use htonl() and similar functions don't compile with -O0. This patch fixes it. It also removes obsolete and misleading comments. This header is not for userspace, so we don't have to care about strange programs these comments mention. Signed-off-by: Michal Schmidt <[EMAIL PROTECTED]> diff --git a/include/linux/byteorder/generic.h b/include/linux/byteorder/generic.h index e86e4a9..3dc715b 100644 - --- a/include/linux/byteorder/generic.h +++ b/include/linux/byteorder/generic.h @@ -124,19 +124,8 @@ #define be32_to_cpus __be32_to_cpus #define cpu_to_be16s __cpu_to_be16s #define be16_to_cpus __be16_to_cpus - -#endif - - - -#if defined(__KERNEL__) /* - - * Handle ntohl and suches. These have various compatibility - - * issues - like we want to give the prototype even though we - - * also have a macro for them in case some strange program - - * wants to take the address of the thing or something.. - - * - - * Note that these used to return a "long" in libc5, even though - - * long is often 64-bit these days.. Thus the casts. - - * * They have to be macros in order to do the constant folding * correctly - if the argument passed into a inline function * it is no longer constant according to gcc.. @@ -147,17 +136,6 @@ #undef htonl #undef htons - -/* - - * Do the prototypes. Somebody might want to take the - - * address or some such sick thing.. - - */ - -extern __u32ntohl(__be32); - -extern __be32htonl(__u32); - -extern __u16ntohs(__be16); - -extern __be16htons(__u16); - - - -#if defined(__GNUC__) && defined(__OPTIMIZE__) - - #define ___htonl(x) __cpu_to_be32(x) #define ___htons(x) __cpu_to_be16(x) #define ___ntohl(x) __be32_to_cpu(x) @@ -168,9 +146,6 @@ extern __be16htons(__u16); #define htons(x) ___htons(x) #define ntohs(x) ___ntohs(x) - -#endif /* OPTIMIZE */ - - #endif /* KERNEL */ - - #endif /* _LINUX_BYTEORDER_GENERIC_H */ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Red Hat - http://enigmail.mozdev.org iD8DBQFF3w75abKV90ewf0QRAi+iAJ4g/NZXKdspLSi5wiRlzu5U0ytJFwCdEKD9 RUDYj69LURttm8qyCUCHz3k= =73ft -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] on 2.6.20-rc5-rt10
Remy Bohmer wrote: Hello All, Once in a while we see the following stacktrace. We do not know yet the exact condition that generates this, but is there anyone that recognises this oops? Kind Regards, Remy Bohmer [...] Jan 30 14:09:20 localhost kernel: Modules linked in: cap_over commoncap i2c_dev uhci_hcd i2c_i801 i2c_core ehci_hcd What's the cap_over module? I can't find it in my kernel anywhere. Michal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: What happen when hangs !!
Jaswinder Singh wrote: Sometimes my machine hangs in userspace area like this :- VFS: Mounted root (ext3 filesystem). Freeing init memory: 124K INIT: <> OR VFS: Mounted root (ext3 filesystem). Freeing init memory: 124K INIT: version 2.85 booting <> How can I debug this hang, what are the cases. When it hangs, try to capture the list of processes using Alt+SysRq+T. You need to have CONFIG_MAGIC_SYSRQ enabled in the kernel. Michal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PREEMPT is messing with everyone
Jaswinder Singh skrev: Yes, Compiler will remove it but this looks ugly and confusing. Why dont we use like this :- #ifdef CONFIG_PREEMPT #include #endif #ifdef CONFIG_PREEMPT preempt_disable(); #endif #ifdef CONFIG_PREEMPT preempt_enable(); #endif Surely you're joking. It is much more readable and maintainable to hide the #ifdef-hackery in header files than to clutter the *.c files. Michal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PREEMPT is messing with everyone
Jaswinder Singh wrote: Hi, preempt stuff SHOULD only stay in #ifdef CONFIG_PREEMP_* , but it is messing with everyone even though not defined. e.g. 1. linux-2.6.19/kernel/spinlock.c Line 18: #include Line 26: preempt_disable(); Line 32: preempt_disable(); and so on . Don't worry. These compile into "do { } while (0)" (i.e. nothing) when CONFIG_PREEMPT is not set. 2. linux-2.6.19/kernel/sched.c Line 1096: int preempted; Line 1104: preempted = !task_running(rq, p); Line 1106: if (preempted) Line 2059: if (TASK_PREEMPTS_CURR(p, this_rq)) Linux always does preemptive multitasking of user tasks. These have nothing to do with CONFIG_PREEMPT. Line 3355:current->comm, preempt_count(), current->pid); Line 3342: preempt_disable(); Line 3375: if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) { preempt_count() is useful in !CONFIG_PREEMPT kernels too. It stores information about the current context (hardirq, softirq, ...). [...] 70 to 80 % of this code is removed when compiled. but 20 to 30 % code left in binary kernel image. Why Linux kernel is wasting its resources which is not defined at all. I don't think that's the case. Any solution ? Thank you, Best Regards, Jaswinder Singh. Michal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19-rc6-rt3, yum repo
Ingo Molnar wrote: i've released the 2.6.18-rc6-rt3 tree Hi Ingo, lockdep doesn't compile on UP. per_cpu_offset only makes sense on SMP. Michal diff --git a/kernel/lockdep.c b/kernel/lockdep.c index 8f6ba22..d46082d 100644 --- a/kernel/lockdep.c +++ b/kernel/lockdep.c @@ -1194,8 +1194,13 @@ register_lock_class(struct lockdep_map * */ if (!static_obj(lock->key)) { debug_locks_off(); +#ifdef CONFIG_SMP printk("INFO: trying to register non-static key %p (%016lx).\n", lock->key, per_cpu_offset(raw_smp_processor_id())); +#else + printk("INFO: trying to register non-static key %p.\n", + lock->key); +#endif printk("the code is fine but needs lockdep annotation.\n"); printk("turning off the locking correctness validator.\n"); dump_stack(); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How to find out kernel stack over flow?
nazim khan wrote: I suspect that one of my module that I am inserting in the kernel may be causing the stack overflow which is leading to kernel crash (may because it is corrupting some one lese memory). How can I find this out? You could enable CONFIG_DEBUG_STACKOVERFLOW. If you showed us your module's source code, someone might see the bug. Michal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-rc6-rt1
Ingo Molnar wrote: i've released the 2.6.13-rc6-rt1 tree, which can be downloaded from the usual place: http://redhat.com/~mingo/realtime-preempt/ as the name already suggests, i've switched to a new, simplified naming scheme, which follows the usual naming convention of trees tracking the mainline kernel. The numbering will be restarted for every new upstream kernel the -RT tree is merged to. Great! With this naming scheme it is easy to teach Matt Mackall's ketchup script about the -RT tree. The modified ketchup script can be downloaded from: http://www.uamt.feec.vutbr.cz/rizeni/pom/ketchup-0.9+rt Matt, would you release a new ketchup version with this support for Ingo's tree? Michal --- ketchup-0.9 2005-08-16 14:06:20.0 +0200 +++ ketchup-0.9+rt 2005-08-16 14:24:05.0 +0200 @@ -307,7 +307,11 @@ version_info = { '2.6-mjb': (latest_mjb, kernel_url + "/people/mbligh/%(prebase)s/patch-%(full)s.bz2", r'patch-(2.6.*?).bz2', - 1, "Martin Bligh's random collection 'o crap") + 1, "Martin Bligh's random collection 'o crap"), +'2.6-rt': (latest_dir, + "http://people.redhat.com/mingo/realtime-preempt/patch-%(full)s", + r'patch-(2.6.*?)', + 0, "Ingo Molnar's realtime-preempt kernel") } def version_url(ver, sign = 0):
Re: captive-ntfs FUSE support?
Kristoffer wrote: captive ntfs: http://www.jankratochvil.net/project/captive/ http://www.jankratochvil.net/project/captive/CVS.html.pl Can someone please port cvs captive-ntfs to FUSE? OK. How much do you pay me? Michal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: amd64-agp vs. swsusp
Pavel Machek wrote: I assume it is in -rc6, too; it is long-standing bug and I am not aware of any attempts at fixing it. Please file bug report, assign to me. I've filed it as Bug 5018. Michal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] [PATCH] swsusp: simpler calculation of number of pages in PBE list
Rafael J. Wysocki wrote: On Friday, 29 of July 2005 21:46, Michal Schmidt wrote: The function calc_nr uses an iterative algorithm to calculate the number of pages needed for the image and the pagedir. Exactly the same result can be obtained with a one-line expression. Could you please post the proof? Rafael OK, attached is a proof-by-brute-force program. It compares the results of the original function and the simplified one. This is its output: $ ./calc_nr2 checked 0 ... checked 1 ... checked 2 ... checked 3 ... checked 4 ... checked 5 ... checked 6 ... checked 7 ... checked 8 ... checked 9 ... checked 10 ... checked 11 ... checked 12 ... checked 13 ... checked 14 ... checked 15 ... checked 16 ... checked 17 ... checked 18 ... checked 19 ... checked 20 ... checked 21 ... First difference at 2130706433: -2147483646 x -2147483647 It means that the two functions give the same results for sensible values of the input argument. They results only differ when they overflow into negative values. At this point both of the results are useless. Michal #include #include typedef struct { unsigned long val; } swp_entry_t; typedef struct pbe { unsigned long address; unsigned long orig_address; swp_entry_t swap_address; struct pbe *next; } suspend_pagedir_t; #define PAGE_SIZE 4096 #define PBES_PER_PAGE (PAGE_SIZE/sizeof(struct pbe)) static int calc_nr_orig(int nr_copy) { int extra = 0; int mod = !!(nr_copy % PBES_PER_PAGE); int diff = (nr_copy / PBES_PER_PAGE) + mod; do { extra += diff; nr_copy += diff; mod = !!(nr_copy % PBES_PER_PAGE); diff = (nr_copy / PBES_PER_PAGE) + mod - extra; } while (diff > 0); return nr_copy; } static int calc_nr(int nr_copy) { return nr_copy + (nr_copy+PBES_PER_PAGE-2)/(PBES_PER_PAGE-1); } int main() { int i; for (i=0; i>=0; i++) { if (i%1 == 0) printf("checked %d ...\n", i); if (calc_nr(i) != calc_nr_orig(i)) { printf("First difference at %d: %d x %d\n", i, calc_nr(i), calc_nr_orig(i)); break; } } return 0; }
[PATCH] swsusp: simpler calculation of number of pages in PBE list
The function calc_nr uses an iterative algorithm to calculate the number of pages needed for the image and the pagedir. Exactly the same result can be obtained with a one-line expression. Signed-off-by: Michal Schmidt <[EMAIL PROTECTED]> diff -Nurp -X dontdiff.new linux-mm/kernel/power/swsusp.c linux-mm.mich/kernel/power/swsusp.c --- linux-mm/kernel/power/swsusp.c 2005-07-28 13:57:53.0 +0200 +++ linux-mm.mich/kernel/power/swsusp.c 2005-07-29 21:01:46.0 +0200 @@ -737,18 +737,7 @@ static void copy_data_pages(void) static int calc_nr(int nr_copy) { - int extra = 0; - int mod = !!(nr_copy % PBES_PER_PAGE); - int diff = (nr_copy / PBES_PER_PAGE) + mod; - - do { - extra += diff; - nr_copy += diff; - mod = !!(nr_copy % PBES_PER_PAGE); - diff = (nr_copy / PBES_PER_PAGE) + mod - extra; - } while (diff > 0); - - return nr_copy; + return nr_copy + (nr_copy+PBES_PER_PAGE-2)/(PBES_PER_PAGE-1); } /**
Re: [RFT] solve "swsusp plays yoyo" with disks
Michal Schmidt wrote: Pavel Machek wrote: Hi! I'd like to get this tested under as many configurations as possible. With this, your hdd should no longer do "yoyo" (spindown, spinup, spindown) during suspend... It looks like the patch is now in -mm (I use 2.6.13-rc3-mm1). But my disks still yoyo during suspend. What more is needed? Some patch to ide-disk.c ? I think I've found the problem. The attached patch stops the disks from spinning down and up on suspend. The patch applies to 2.6.13-rc3-mm1. Signed-off-by: Michal Schmidt <[EMAIL PROTECTED]> diff -Nurp -X dontdiff.new linux-mm/drivers/ide/ide-io.c linux-mm.mich/drivers/ide/ide-io.c --- linux-mm/drivers/ide/ide-io.c 2005-06-30 01:00:53.0 +0200 +++ linux-mm.mich/drivers/ide/ide-io.c 2005-07-21 16:59:46.0 +0200 @@ -150,7 +150,7 @@ static void ide_complete_power_step(ide_ switch (rq->pm->pm_step) { case ide_pm_flush_cache: /* Suspend step 1 (flush cache) complete */ - if (rq->pm->pm_state == 4) + if (rq->pm->pm_state == PM_EVENT_FREEZE) rq->pm->pm_step = ide_pm_state_completed; else rq->pm->pm_step = idedisk_pm_standby;
Re: amd64-agp vs. swsusp
Pavel Machek wrote: I'm trying to do something similar for x86_64. See the attached patch. Unfortunately, it doesn't help. The behaviour seems unchanged (resume still works iff amd64-agp wasn't loaded before suspend). Are you sure problem is on level4_pgt? We probably use constant level4_pgt but split pages at some deeper level. You may want try saving 3rd-level table, instead. I'm not sure about that at all. That was just my attempt of cargocult programming :-) OK, I'll try saving the 3rd-level table. It'll take me some time to figure out how to do that, however :-) Michal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFT] solve "swsusp plays yoyo" with disks
Pavel Machek wrote: Hi! I'd like to get this tested under as many configurations as possible. With this, your hdd should no longer do "yoyo" (spindown, spinup, spindown) during suspend... It looks like the patch is now in -mm (I use 2.6.13-rc3-mm1). But my disks still yoyo during suspend. What more is needed? Some patch to ide-disk.c ? Michal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: amd64-agp vs. swsusp
Pavel Machek wrote: Long time ago there were i386 problems because we assumed that kernel is mapped in one big mapping and agp broke that assumption. Copying pages backwards "fixed" it (and then we done proper fix). It should not be, but it seems similar to this problem Do you mean this patch of yours?: http://www.ussg.iu.edu/hypermail/linux/kernel/0404.3/0640.html I'm trying to do something similar for x86_64. See the attached patch. Unfortunately, it doesn't help. The behaviour seems unchanged (resume still works iff amd64-agp wasn't loaded before suspend). Michal diff -Nurp -X dontdiff.new linux-mm/arch/x86_64/kernel/suspend_asm.S linux-mm.mich/arch/x86_64/kernel/suspend_asm.S --- linux-mm/arch/x86_64/kernel/suspend_asm.S 2005-06-30 01:00:53.0 +0200 +++ linux-mm.mich/arch/x86_64/kernel/suspend_asm.S 2005-07-21 11:53:17.0 +0200 @@ -41,7 +41,7 @@ ENTRY(swsusp_arch_suspend) ENTRY(swsusp_arch_resume) /* set up cr3 */ - leaq init_level4_pgt(%rip),%rax + leaq swsusp_level4_pgt(%rip),%rax subq $__START_KERNEL_map,%rax movq %rax,%cr3 diff -Nurp -X dontdiff.new linux-mm/arch/x86_64/mm/init.c linux-mm.mich/arch/x86_64/mm/init.c --- linux-mm/arch/x86_64/mm/init.c 2005-07-18 19:48:12.0 +0200 +++ linux-mm.mich/arch/x86_64/mm/init.c 2005-07-21 11:21:36.0 +0200 @@ -310,10 +310,32 @@ void __init init_memory_mapping(unsigned extern struct x8664_pda cpu_pda[NR_CPUS]; +#ifdef CONFIG_SOFTWARE_SUSPEND +/* + * Swap suspend & friends need this for resume because things like the intel-agp + * driver might have split up a kernel 4MB mapping. + */ +char __nosavedata swsusp_level4_pgt[PAGE_SIZE] + __attribute__ ((aligned (PAGE_SIZE))); + +static inline void save_pg_dir(void) +{ + memcpy(swsusp_level4_pgt, init_level4_pgt, PAGE_SIZE); +} +#else +static inline void save_pg_dir(void) +{ +} +#endif + /* Assumes all CPUs still execute in init_mm */ void zap_low_mappings(void) { - pgd_t *pgd = pgd_offset_k(0UL); + pgd_t *pgd; + + save_pg_dir(); + + pgd = pgd_offset_k(0UL); pgd_clear(pgd); flush_tlb_all(); }
Re: amd64-agp vs. swsusp
Rafael J. Wysocki wrote: On Thursday, 21 of July 2005 00:07, Michal Schmidt wrote: I also tried putting a printk before restore_processor_state(), but I'm not sure if it is safe to use printk there. Yes, it is, but you may be unable to see the message if the box reboots before it can be displayed. OK, but then I also tried putting a 5s long busy wait there and the reset was not delayed. Therefore, the reset must be occurring before restore_processor_state(). Or is there a reason why for(i=0; i<5000; i++) udelay(1000); wouldn't work as expected? Michal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: amd64-agp vs. swsusp
Rafael J. Wysocki wrote: On Tuesday, 19 of July 2005 23:26, Michal Schmidt wrote: I have rebuilt agpgart and amd64-agp into the kernel and now it has resumed successfully for the first time. Thank you for the hint! But I still wonder, why that makes a difference. Before resume the module is not present. When it gets loaded from the image it probably runs with the assumption that the hardware was initialized which is not correct. It seems that the module doesn't even get a chance to run after resume. I've put some printks and udelays into kernel/power/swsusp.c and other places and I've found that the spontaneous reset occurs already in swsusp_arch_resume(), ie. before the drivers get their resume methods called. This is what I have in swsusp_suspend() now: ... save_processor_state(); if ((error = swsusp_arch_suspend())) printk(KERN_ERR "Error %d suspending\n", error); /* Restore control flow magically appears here */ restore_processor_state(); printk(KERN_INFO "processor state restored!\n");/*I added this*/ BUG_ON (nr_copy_pages_check != nr_copy_pages); restore_highmem(); device_power_up(); ... I'm recording the screen during resuming with a digital camera to see if the added printk is displayed before the reset and I am now sure that the reset occurs before that. The last thing I see is: Stopping tasks: --| Freeing memory... done (0 pages freed) swsusp: Need to copy 8121 pages Then on the next frame of the recorded MPEG, the display is already beginning to dim as the computer is resetting. I also tried putting a printk before restore_processor_state(), but I'm not sure if it is safe to use printk there. So I tried putting a loop of 5000 x udelay(1000) there to see if the reset would be delayed by 5s. It was not delayed, so I think that the reset occurs before restore_processor_state(). Michal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: amd64-agp vs. swsusp
Andreas Steinmetz wrote: Michal Schmidt wrote: Does resuming from swsuspend work for anyone with amd64-agp loaded? On my system when I suspend with amd64-agp loaded, I get a spontaneous reboot on resume. It reboots immediately after reading the saved image from disk. This is 100% reproducible. Athlon 64 FX-53, Asus A8V Deluxe, Linux 2.6.13-rc3-mm1. AMD Athlon(tm) 64 Processor 3000+, Acer Aspire Linux gringo 2.6.13-rc3-gringo #36 Sun Jul 17 15:57:17 CEST 2005 x86_64 unknown unknown GNU/Linux CONFIG_AGP=y CONFIG_AGP_AMD64=y swsusp works for me. Could it be mm, agp as a module or some speciality ^^^ That seems to be the problem! of your hardware? I have rebuilt agpgart and amd64-agp into the kernel and now it has resumed successfully for the first time. Thank you for the hint! But I still wonder, why that makes a difference. Michal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
amd64-agp vs. swsusp
Hello, Does resuming from swsuspend work for anyone with amd64-agp loaded? On my system when I suspend with amd64-agp loaded, I get a spontaneous reboot on resume. It reboots immediately after reading the saved image from disk. This is 100% reproducible. Athlon 64 FX-53, Asus A8V Deluxe, Linux 2.6.13-rc3-mm1. Regards, Michal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: rt-preempt and x86_64?
Alistair John Strachan wrote: Hi Ingo, (I searched the list for rt realtime x86_64 x86-64 before posting this, so I hope it's not a duplicate). I've noticed -31 compiles without notable error or warning on x86-64, so I thought maybe it was a valid time to file a bug report about it not working. The machine currently runs 2.6.12 but when booting with PREEMPT_RT mode on the same machine I get: init[1]: segfault at 8010e9c4 rip 8010e9c4 rsp 7fe28018 [...] Do you have latency tracing enabled in the kernel config? Try disabling it. It's a known problem that it doesn't work on x86_64. Michal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Realtime Preemption, 2.6.12, Beginners Guide?
Fernando Lopez-Lezcano wrote: I see the same thing. "CONFIG_PRINTK_IGNORE_LOGLEVEL is not set" but still printk ignores the loglevel (I commented out the #ifdef in kernel/printk.c to make the spurious messages go away). The condition is reversed. The '#ifdef CONFIG_PRINTK_IGNORE_LOGLEVEL' should be '#ifndef CONFIG_PRINTK_IGNORE_LOGLEVEL'. Michal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: please remove reserved word "new" from kernel headers
Rob Prowel wrote: [1.] One line summary of the problem: 2.4 and 2.6 kernel headers use c++ reserved word "new" as identifier in function prototypes. Yes, the kernel is written in C, not C++. using the identifier "new" in kernel headers that are visible to applications programs is a bad idea. Programs are not supposed to include kernel headers. This is a FAQ, see the archives. Michal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How's the nforce4 support in Linux?
Julien Wajsberg wrote: Good point... I just tried, but forcedeth doesn't support netpoll. If you have a pointer, I could try to implement it ;-) Can you try the attached patch for forcedeth? It compiles for me, but I don't have nForce hardware to test it. Michal --- linux-2.6.12-rc1/drivers/net/forcedeth.c.orig 2005-03-26 15:00:12.0 +0100 +++ linux-2.6.12-rc1/drivers/net/forcedeth.c2005-03-26 15:08:56.0 +0100 @@ -1480,6 +1480,13 @@ static void nv_do_nic_poll(unsigned long enable_irq(dev->irq); } +#ifdef CONFIG_NET_POLL_CONTROLLER +static void nv_poll_controller(struct net_device *dev) +{ + nv_do_nic_poll((long) dev); +} +#endif + static void nv_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *info) { struct fe_priv *np = get_nvpriv(dev); @@ -1962,6 +1969,9 @@ static int __devinit nv_probe(struct pci dev->get_stats = nv_get_stats; dev->change_mtu = nv_change_mtu; dev->set_multicast_list = nv_set_multicast; +#ifdef CONFIG_NET_POLL_CONTROLLER + dev->poll_controller = nv_poll_controller; +#endif SET_ETHTOOL_OPS(dev, &ops); dev->tx_timeout = nv_tx_timeout; dev->watchdog_timeo = NV_WATCHDOG_TIMEO;
Re: dmesg command output
cranium2003 wrote: [...] On my RH9 i386 arch i got 16kb output from dmesg. how to increase it? man dmesg (parameter -s). You may also want to increase the kernel buffer size in General Setup -> Kernel log buffer size (CONFIG_LOG_BUF_SHIFT). Michal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fix verify_command to allow burning more than 1 DVD
Peter Osterlund wrote: Michal Schmidt <[EMAIL PROTECTED]> writes: --- linux-2.6.11-mm1/drivers/block/scsi_ioctl.c.orig2005-01-17 20:42:40.0 +0100 +++ linux-2.6.11-mm1/drivers/block/scsi_ioctl.c 2005-01-17 20:43:14.0 +0100 @@ -197,9 +197,7 @@ static int verify_command(struct file *f if (type & CMD_WRITE_SAFE) { if (file->f_mode & FMODE_WRITE) return 0; - } - - if (!(type & CMD_WARNED)) { + } else if (!(type & CMD_WARNED)) { cmd_type[cmd[0]] = CMD_WARNED; printk(KERN_WARNING "scsi: unknown opcode 0x%02x\n", cmd[0]); } That patch will not write the warning message in some cases. Yes. In cases when the device is opened for reading and the command is known as safe_for_write. Do we really want to print this warning in that case? I think this patch is better: --- linux-petero/drivers/block/scsi_ioctl.c |2 +- 1 files changed, 1 insertion(+), 1 deletion(-) diff -puN drivers/block/scsi_ioctl.c~scsi-filter drivers/block/scsi_ioctl.c --- linux/drivers/block/scsi_ioctl.c~scsi-filter 2005-01-18 23:38:37.966026728 +0100 +++ linux-petero/drivers/block/scsi_ioctl.c 2005-01-18 23:38:37.970026120 +0100 @@ -200,7 +200,7 @@ static int verify_command(struct file *f } if (!(type & CMD_WARNED)) { - cmd_type[cmd[0]] = CMD_WARNED; + cmd_type[cmd[0]] |= CMD_WARNED; printk(KERN_WARNING "scsi: unknown opcode 0x%02x\n", cmd[0]); } _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Unable to burn DVDs
Bill Davidsen wrote: Nick Sanders wrote: For me when running growisofs with user permissions on 2.6.10 (ide-cd) it works perfectly 1st time but 2nd time fails with the error below. It works fine when run as root. :-( unable to PREVENT MEDIA REMOVAL: Operation not permitted As an aside audio cd burning with cdrecord works as long as the '-text' option isn't used, if it is the process hangs. I reported a similar thing with cdrecord, writing a first session successfully using the -multi flag, but not being able to append to it or read the size with the "-msinfo" flag. I was totally blown off and told I didn't have permissions on the device, even though I was able to write to it. I believe the true answer is that the SCSI command filter is blocking a command needed to perform the operation, probably a command to lock the door of the drive. In my case I have permissions to write the CD, just not to read the info needed to write additional sessions. Hello, Bill and Nick, could you try the attached patch that I sent to Jens Axboe yesterday? (You can see the mail with an explanation on http://marc.theaimsgroup.com/?l=linux-kernel&m=110599420505734&w=2 ) Michal --- linux-2.6.11-mm1/drivers/block/scsi_ioctl.c.orig 2005-01-17 20:42:40.0 +0100 +++ linux-2.6.11-mm1/drivers/block/scsi_ioctl.c 2005-01-17 20:43:14.0 +0100 @@ -197,9 +197,7 @@ static int verify_command(struct file *f if (type & CMD_WRITE_SAFE) { if (file->f_mode & FMODE_WRITE) return 0; - } - - if (!(type & CMD_WARNED)) { + } else if (!(type & CMD_WARNED)) { cmd_type[cmd[0]] = CMD_WARNED; printk(KERN_WARNING "scsi: unknown opcode 0x%02x\n", cmd[0]); }
[PATCH] fix verify_command to allow burning more than 1 DVD
Hello, I use K3B with growisofs to burn DVDs. After boot I can burn a DVD as a normal user. But only the first one. When I want to burn another one, K3B complains that it is unable to prevent media removal. Then only root can burn DVDs. The bug is in the kernel in the function verify_command. When a process opens the DVD recorder with O_RDONLY and issues a command which is marked safe_for_write, this function is supposed to just return -EPERM and do nothing more. However, there is a bug that causes the command to be marked as CMD_WARNED. From now on no non-privileged process is able to issue this command even if it correctly opens the device with O_RDWR - because the command is no longer marked as CMD_WRITE_SAFE. A patch is attached. Michal --- linux-2.6.11-mm1/drivers/block/scsi_ioctl.c.orig 2005-01-17 20:42:40.0 +0100 +++ linux-2.6.11-mm1/drivers/block/scsi_ioctl.c 2005-01-17 20:43:14.0 +0100 @@ -197,9 +197,7 @@ static int verify_command(struct file *f if (type & CMD_WRITE_SAFE) { if (file->f_mode & FMODE_WRITE) return 0; - } - - if (!(type & CMD_WARNED)) { + } else if (!(type & CMD_WARNED)) { cmd_type[cmd[0]] = CMD_WARNED; printk(KERN_WARNING "scsi: unknown opcode 0x%02x\n", cmd[0]); }