Re: [PATCH] x86: fix PSE pagetable construction
Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes: > When constructing the initial pagetable in pagetable_init, make sure > that non-PSE pmds are updated to PSE ones. This fixes a bug in the > paravirt pagetable init code, which otherwise tries to avoid overwrite > existing mappings. > > This moves the definition of pmd_huge() out of the hugetlbfs files > into pgtable.h. > > [ I know Eric would like to make larger changes to the way > pagetable init works, but this patch is the minimal fix to an > existing bug. ] My preference would be for whoever had: paravirt_ops-hooks-to-set-up-initial-pagetable.patch queued to drop it until we can get a version that doesn't break early page table setup. Your partial fix still leaves the real page tables in a partially incorrect state, and even if we removed your changes from the PSE path we still can wind up not failing to set _PAGE_NX in the appropriate places on 4K pages. I have tried to be constructive and suggest how we can fix this cleanly. Short of that this is what I see needing to happen to fix the above patches changes to arch/i386/mm/init.c. Eric ... Subject: [PATCH] i386: During page table initialization always set the leaf page table entries. If we don't set the leaf page table entries it is quite possible that we will inherit and incorrect page table entry from the initial boot page table setup in head.S. So we need to redo the effort here. I don't know what to do about hypervisors like Xen that require their page tables to be read only, as our identity mapped page table entries currently violate that requirement. All I know if the kernel doesn't work properly on native hardware it is a bug. Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]> --- arch/i386/mm/init.c | 52 +++--- 1 files changed, 20 insertions(+), 32 deletions(-) diff --git a/arch/i386/mm/init.c b/arch/i386/mm/init.c index b77a43c..dbe16f6 100644 --- a/arch/i386/mm/init.c +++ b/arch/i386/mm/init.c @@ -63,18 +63,18 @@ static pmd_t * __init one_md_table_init(pgd_t *pgd) pmd_t *pmd_table; #ifdef CONFIG_X86_PAE - pmd_table = (pmd_t *) alloc_bootmem_low_pages(PAGE_SIZE); - - paravirt_alloc_pd(__pa(pmd_table) >> PAGE_SHIFT); - set_pgd(pgd, __pgd(__pa(pmd_table) | _PAGE_PRESENT)); - pud = pud_offset(pgd, 0); - if (pmd_table != pmd_offset(pud, 0)) - BUG(); -#else + if (!(pgd_val(*pgd) & _PAGE_PRESENT)) { + pmd_table = (pmd_t *) alloc_bootmem_low_pages(PAGE_SIZE); + + paravirt_alloc_pd(__pa(pmd_table) >> PAGE_SHIFT); + set_pgd(pgd, __pgd(__pa(pmd_table) | _PAGE_PRESENT)); + pud = pud_offset(pgd, 0); + if (pmd_table != pmd_offset(pud, 0)) + BUG(); + } +#endif pud = pud_offset(pgd, 0); pmd_table = pmd_offset(pud, 0); -#endif - return pmd_table; } @@ -84,7 +84,7 @@ static pmd_t * __init one_md_table_init(pgd_t *pgd) */ static pte_t * __init one_page_table_init(pmd_t *pmd) { - if (pmd_none(*pmd)) { + if (!(pmd_val(*pmd) & _PAGE_PRESENT)) { pte_t *page_table = (pte_t *) alloc_bootmem_low_pages(PAGE_SIZE); paravirt_alloc_pt(__pa(page_table) >> PAGE_SHIFT); @@ -109,7 +109,6 @@ static pte_t * __init one_page_table_init(pmd_t *pmd) static void __init page_table_range_init (unsigned long start, unsigned long end, pgd_t *pgd_base) { pgd_t *pgd; - pud_t *pud; pmd_t *pmd; int pgd_idx, pmd_idx; unsigned long vaddr; @@ -120,13 +119,10 @@ static void __init page_table_range_init (unsigned long start, unsigned long end pgd = pgd_base + pgd_idx; for ( ; (pgd_idx < PTRS_PER_PGD) && (vaddr != end); pgd++, pgd_idx++) { - if (!(pgd_val(*pgd) & _PAGE_PRESENT)) - one_md_table_init(pgd); - pud = pud_offset(pgd, vaddr); - pmd = pmd_offset(pud, vaddr); + pmd = one_md_table_init(pgd); + pmd = pmd + pmd_index(vaddr); for (; (pmd_idx < PTRS_PER_PMD) && (vaddr != end); pmd++, pmd_idx++) { - if (pmd_none(*pmd)) - one_page_table_init(pmd); + one_page_table_init(pmd); vaddr += PMD_SIZE; } @@ -159,11 +155,7 @@ static void __init kernel_physical_mapping_init(pgd_t *pgd_base) pfn = 0; for (; pgd_idx < PTRS_PER_PGD; pgd++, pgd_idx++) { - if (!(pgd_val(*pgd) & _PAGE_PRESENT)) - pmd = one_md_table_init(pgd); - else - pmd = pmd_offset(pud_offset(pgd, PAGE_OFFSET), PAGE_OFFSET); - + pmd = one_md_table_init(pgd); if (pfn >= max_low_pfn) continue; for (pmd_idx = 0; pmd_idx < PTRS_PER_PMD
Re: checkpatch, a patch checking script.
> Use WARN_ON & Recovery code rather than BUG() and BUG_ON() > 23286:+ BUILD_BUG_ON(BCM43xx_SEC_KEYSIZE < ETH_ALEN); BTW, I missed this before -- BUILD_BUG_ON() is actually far better than WARN_ON(), I think. Maybe something like this? (Although someone who knows perl probably has a better way) --- Don't tell people to change BUILD_BUG_ON() to WARN_ON(). Signed-off-by: Roland Dreier <[EMAIL PROTECTED]> --- checkpatch.pl.orig 2007-04-27 20:30:34.0 -0700 +++ checkpatch.pl 2007-04-27 22:54:42.0 -0700 @@ -123,7 +123,7 @@ $warnings += search(qr/kernel_thread\(/, "Use kthread abstraction instead of kernel_thread()\n"); $warnings += search(qr/typedef/, "Do not add new typedefs.\n"); $warnings += search(qr/uint32_t/, "Incorrect type usage for kernel code. Use __u32 etc.\n"); - $warnings += search(qr/BUG(_ON)\(/, "Use WARN_ON & Recovery code rather than BUG() and BUG_ON()\n"); + $warnings += search(qr/(?http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: checkpatch, a patch checking script.
> box:/usr/src/25> ~/checkpatch.pl patches/git-infiniband.patch Yup, I ran this too. > Checking patches/git-infiniband.patch: signoffs = 113 > Use WARN_ON & Recovery code rather than BUG() and BUG_ON() > 8143:+ BUG_ON(mlx4_ib_alloc_db_from_pgdir(pgdir, db, order)); > 12629:+ BUG_ON(cmd->free_head < 0); > 16580:+ BUG_ON(index < dev->caps.num_mgms); > 16665:+ BUG_ON(amgm_index_to_free < dev->caps.num_mgms); > 16681:+ BUG_ON(index < dev->caps.num_mgms); I agree -- killing the kernel for a driver bug is dump. I'll remove all these BUGs before merging. > Don't init statics to 0/NULL: > 10333:+ path->static_rate = 0; This is a false positive/opportunity for script improvement, obviously. > 15461:+static int msi_x = 0; > 16113:+ static int mlx4_version_printed = 0; Already zapped these. - R. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: - maps2-add-proc-pid-pagemap-interface-fix.patch removed from -mm tree
On Sat, 28 Apr 2007 06:13:39 +0100 (BST) Hugh Dickins <[EMAIL PROTECTED]> wrote: > On Fri, 27 Apr 2007, Andrew Morton wrote: > > > > hm, could do. might_sleep() is intertwined with preempt in complex ways, > > but we did decouple that at the config level. no_mmap_sem() will dtrt for > > all preempt settings. > > > > But I'll be keeping this as a -mm-only debug patch (which brings us up to > > about thirty of 'em), so I think it's best to make it unconfigurable so we > > get maximum coverage. > > > > That's if it actually works. I haven't tried running it yet, and I have a > > feeling that running it might cause a big "doh" moment. We'll see. > > Yes, I'm expecting the crucial > > > + WARN_ON(rwsem_is_locked(>mmap_sem)) > > to give a bogus warning every time another thread (or /proc, > or swapoff, or whatever) happens to have this mmap_sem locked. > might_sleep() is quite different, works on our thread's info. > Yes. lockdep has a way of working out if this task already has a particular lock for reading or writing, but it isn't immediately obvious how to extract that. I guess a simple hack would be do do a down_read() on it. If it's already held for reading, lockdep should warn. If it's already held for writing someone will notice. Oh well, it's not my top priority. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/33] 2.6.20-stable review
On Sat, Apr 28, 2007 at 12:21:24PM +0800, Bryan WU wrote: > On Fri, 2007-04-27 at 08:13 -0700, Greg KH wrote: > > On Fri, Apr 27, 2007 at 06:15:54PM +0800, Wu, Bryan wrote: > > > > > > You know for some customer's product, they want to use the stable and > > > long term support kernel instead to use the latest one. > > > > Then they should get that support from a vendor, not from the kernel.org > > releases :) > > > > Yeah, but we are the vendor as you mentioned. -:)) Ah, then you already know what to do :) > If we wanna to release a kernel to customer product development, how to > choose the stable version? That's up to you. > Currently, we always followed the kernel release cycle/rules and give > customer the latest stable version. Ok, then what has really changed here? We've been doing this .y release thing (also called -stable) for about 2 years now, nothing is different this week from last. Confused, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Allow __vmalloc with GFP_ATOMIC
--- Nick Piggin <[EMAIL PROTECTED]> wrote: >> The patch below uses bh disabled lock for vmlist_lock, so >> that __vmalloc can be used in interrupt context. > Hi Giri, > > I'm sure I've read the reason for this one before, but when you do patches > like these, can you include that reason in the changelog please? > > Thanks, > Nick Sorry about that. There were too many mails on this subjet and thought it might not be good to quote them. I am quoting here the ones that matter to this discussion. If you need more (all), let me know: http://www.ussg.iu.edu/hypermail/linux/kernel/0605.2/1608.html http://www.ussg.iu.edu/hypermail/linux/kernel/0605.2/1611.html http://www.ussg.iu.edu/hypermail/linux/kernel/0605.2/1656.html http://www.ussg.iu.edu/hypermail/linux/kernel/0605.2/1779.html http://www.ussg.iu.edu/hypermail/linux/kernel/0605.2/1669.html Thanks, Giri __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ext3][kernels >= 2.6.20.7 at least] KDE going comatose when FS is under heavy write load (massive starvation)
On Sat, 28 Apr 2007, Mikulas Patocka wrote: On Fri, 27 Apr 2007, Bill Huey wrote: Hi SpadFS doesn't write to unallocated parts like log filesystems (LFS) or phase tree filesystems (TUX2); --- BTW, I don't think that writing to unallocated parts of disk is good idea. These filesystems have cool write benchmarks, but one subtle (and unbenchmarkable) problem: They group files according to time when they were created and not according to directory hierarchy. When the user has directory with project files and he edited different files at different times, normal filesystems will place the files near each other (so that "grep blabla *" is fast) and log-structured filesystems will scatter the files over the whole disk. Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem 2.6.21
On Friday 27 April 2007 14:39, Riccardo Ricci wrote: > > Hi to everyone, > i've compiled kernel 2.6.21 on my debian PIII 650 / 256MB / Dell Latitude > J650GT. With 2.6.20.8 all works very good, with 2.6.21 don't boot... While > booting it stops after ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 *5 6 7 9 > 10 11 12 14 15). Does 2.6.20.8 boot with acpi=off, does 2.6.21? Any chance you can get the serial console log of the failure when booted with "debug"? Also, the 2.6.20.8 dmesg is missing the beginning, try dmesg -s64000 -- though it will probably not be very interesting until 2.6.21 output is available to compare to it. thanks, -Len - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/17] Large Blocksize Support V3
On Fri, 27 Apr 2007 22:08:17 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> wrote: > On Fri, 27 Apr 2007, Andrew Morton wrote: > > > My (repeated) point is that if we populate pagecache with > > physically-contiguous 4k > > pages in this manner then bio+block will be able to create much larger SG > > lists. > > True but the "if" becomes exceedingly rare the longer the system was in > operation. 64k implies 16 pages in sequence. This is going to be a bit > difficult to get. Nonsense. We need higher-order allocations whichever scheme is used. And lumpy reclaim in the moveable zone should be extremely reliable. It _should_ be the case that it can only be defeated by excessive use of mlock. But we've seen no testing to either confirm or refute that. > Then there is the overhead of handling these pages. > Which may be not significant given growing processor capabilities in some > usage cases. In others like a synchronized application running on a large > number of nodes this is likely introduce random delays between processor > to processor communication that will significantly impair performance. Well, who knows. > And then there is the long list of features that cannot be accomplished > with such an approach like mounting a volume with large block size, > handling CD/DVDs, getting rid of various shim layers etc. There are disadvantages against which this must be traded off. And if the volume which is mounted with the large page option also has a lot of small files on it, we've gone and dramatically deoptimised the user's machine. It would have been better to make the 4k-page implementation faster, rather than working around existing inefficiencies. > I'd also like to have much higher orders of allocations for scientific > applications that require an extremely large I/O rate. For those we > could f.e. dedicate memory nodes that will only use a very high page > order to prevent fragmentation. E.g. 1G pages is certainly something that > lots of our customers would find beneficial (and they are actually > already using those types of pages in the form of huge pages but with > limited capabilities). > > But then we are sadly again trying to find another workaround that > will not get us there and will not allow the flexibility in the > VM that would make things much easier for lots of usage scenarios. Your patch *is* a workaround. It's a workaround for small CPU pagesize. It's a workaround for suboptimal VFS anf filesystem implementations. It's a workaround for a disk adapter which has suboptimal readahead and writeback caching implementations. See? I can spin too. Fact is, this change has *costs*. And you're completely ignoring them, trying to spin them away. It ain't working and it never will. I'm seeing no serious attempt to think about how we can reduce those costs while retaining most of the benefits. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
On Fri, 27 Apr 2007, Rohit Seth wrote: > On Fri, 2007-04-27 at 15:18 +0100, Hugh Dickins wrote: > > Right. Extra flush_icache_page routines will add cost to archs that > have non-null definition of this routine. BTW, isn't flush_icache_page > marked for deprecation? Yes, flush_icache_page is marked for deprecation: but that's hardly a reason to add another under a different name! (Not quite what you did, but...) > lazy_mmu_prot_update was added specifically for notifying change in > protection. So, in a way it is closer to update_mmu_cache (Which is for > change in mappings itself). Though for ia64 implementation, this ends > up flushing the icaches when needed. The ia64 implementation is the only one which has any use for it, and it's only interested when it's executable i.e. "lazy_mmu_prot_update" is a name concealing some overdesign. > Hopefully my reply is useful. Yes, thanks Rohit, and I'll want to read through it again later. In particular, I've now a better idea what's "lazy" about it. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[-mm Patch]nbd: check the return value of sysfs_create_file
Since 'sysfs_create_file' is declared with attribute warn_unused_result, we must always check its return value carefully. Signed-off-by: WANG Cong <[EMAIL PROTECTED]> --- --- linux-2.6.21-rc7-mm2/drivers/block/nbd.c.orig 2007-04-27 17:27:47.0 +0800 +++ linux-2.6.21-rc7-mm2/drivers/block/nbd.c2007-04-27 17:47:32.0 +0800 @@ -373,7 +373,10 @@ static void nbd_do_it(struct nbd_device BUG_ON(lo->magic != LO_MAGIC); lo->pid = current->pid; - sysfs_create_file(>disk->kobj, _attr.attr); + if (sysfs_create_file(>disk->kobj, _attr.attr)) { + printk(KERN_ERR "nbd: sysfs_create_file failed!"); + return; + } while ((req = nbd_read_stat(lo)) != NULL) nbd_end_request(req); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: - maps2-add-proc-pid-pagemap-interface-fix.patch removed from -mm tree
On Fri, 27 Apr 2007, Andrew Morton wrote: > > hm, could do. might_sleep() is intertwined with preempt in complex ways, > but we did decouple that at the config level. no_mmap_sem() will dtrt for > all preempt settings. > > But I'll be keeping this as a -mm-only debug patch (which brings us up to > about thirty of 'em), so I think it's best to make it unconfigurable so we > get maximum coverage. > > That's if it actually works. I haven't tried running it yet, and I have a > feeling that running it might cause a big "doh" moment. We'll see. Yes, I'm expecting the crucial > + WARN_ON(rwsem_is_locked(>mmap_sem)) to give a bogus warning every time another thread (or /proc, or swapoff, or whatever) happens to have this mmap_sem locked. might_sleep() is quite different, works on our thread's info. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
On Sat, 28 Apr 2007, Nick Piggin wrote: > > OIC, you need a virtual address to evict the icache, so you can't > flush at flush_dcache time? Or does ia64 have an instruction to > flush the whole icache? (it would be worth testing, to see how much > performance suffers). I'm puzzled by that remark: the ia64 flush_icache_range always has a virtual address, it uses the kernel virtual address; it takes no interest in whether there's a user virtual address. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: checkpatch, a patch checking script.
On Fri, 27 Apr 2007 23:08:05 -0400 Dave Jones <[EMAIL PROTECTED]> wrote: > You can find the script at http://www.codemonkey.org.uk/projects/checkpatch/ hm. box:/usr/src/25> ~/checkpatch.pl patches/slub-core.patch Checking patches/slub-core.patch: signoffs = 30 Use WARN_ON & Recovery code rather than BUG() and BUG_ON() 1588:+ VM_BUG_ON(!irqs_disabled()); 1834:+ BUG_ON(flags & ~(GFP_DMA | GFP_LEVEL_MASK)); 2538:+ BUG_ON(kmalloc_caches->size < sizeof(struct kmem_cache_node)); 2544:+ BUG_ON(!page); 2546:+ BUG_ON(!n); 2736:+ BUG_ON(err); 2762:+ BUG_ON(flags & SLUB_UNIMPLEMENTED); 2777:+ BUG_ON(flags & (SLAB_RED_ZONE | SLAB_POISON | 2779:+ BUG_ON(ctor || dtor); 3054:+ BUG_ON(index < 0); 3118:+ BUG_ON(!page); 3120:+ BUG_ON(!s); 4062:+ BUG_ON(!name); 4083:+ BUG_ON(p > name + ID_STR_LENGTH - 1); 4188:+ BUG_ON(err); 15 warnings surely we can do better than that ;) box:/usr/src/25> ~/checkpatch.pl patches/git-ieee1394.patch Checking patches/git-ieee1394.patch: signoffs = 291 Do not add new typedefs. 5239:+typedef int (*descriptor_callback_t)(struct context *ctx, 7254:+typedef void (*scsi_done_fn_t) (struct scsi_cmnd *); 8668:+typedef void (*fw_node_callback_t) (struct fw_card * card, 10077:+typedef void (*fw_packet_callback_t) (struct fw_packet *packet, 10080:+typedef void (*fw_transaction_callback_t)(struct fw_card *card, int rcode, 10085:+typedef void (*fw_address_callback_t)(struct fw_card *card, 10093:+typedef void (*fw_bus_reset_callback_t)(struct fw_card *handle, 10245:+typedef void (*fw_iso_callback_t) (struct fw_iso_context *context, Use WARN_ON & Recovery code rather than BUG() and BUG_ON() 4342:+ BUG_ON(j >= ARRAY_SIZE(group->attrs)); 9868:+ BUG_ON(retval < 0); 9872:+ BUG_ON(retval < 0); 9876:+ BUG_ON(retval < 0); 9878:+ BUG_ON(retval < 0); 10952:+ BUG_ON(!kv || !associate || kv->key.id == CSR1212_KV_ID_DESCRIPTOR || 10983:+ BUG_ON(!kv || !dir || dir->key.type != CSR1212_KV_TYPE_DIRECTORY); 11396:+ BUG_ON(!csr || !csr->ops || !csr->ops->allocate_addr_range || 11750:+ BUG_ON(!csr); 11802:+ BUG_ON(csr->max_rom < 1); 12106:+ BUG_ON(!csr || !kv || csr->max_rom < 1); 12248:+ BUG_ON(!csr || !csr->ops || !csr->ops->bus_read); 14541:+ BUG_ON(max_payload < 512 - ETHER1394_GASP_OVERHEAD); 14567:+ BUG_ON(max_payload < 512 - ETHER1394_GASP_OVERHEAD); 15213:+ BUG_ON(!list_empty(>driver_list) || 23 warnings ok. box:/usr/src/25> ~/checkpatch.pl patches/git-net.patch Checking patches/git-net.patch: signoffs = 831 Do not add new typedefs. 18871:+typedef unsigned int sk_buff_data_t; 18873:+typedef unsigned char *sk_buff_data_t; 20686:+typedef int (*rtnl_doit_func)(struct sk_buff *, struct nlmsghdr *, void *); 20687:+typedef int (*rtnl_dumpit_func)(struct sk_buff *, struct netlink_callback *); Incorrect type usage for kernel code. Use __u32 etc. 21854:+uint32_t __attribute__((weak)) __div64_32(uint64_t *n, uint32_t base) 21865:+ uint32_t high, d; Use WARN_ON & Recovery code rather than BUG() and BUG_ON() 11084:+ BUG_ON(ip_hdr(skb)->protocol != IPPROTO_TCP); 21600:+ BUG_ON(!wiphy); 21633:+ BUG_ON(!wdev); 25577:+ BUG_ON(r->ctarget != NULL); 26832:+ BUG_ON(msgindex < 0 || msgindex >= RTM_NR_MSGTYPES); 26882:+ BUG_ON(protocol < 0 || protocol >= NPROTO); 26936:+ BUG_ON(protocol < 0 || protocol >= NPROTO); 26959:+ BUG_ON(protocol < 0 || protocol >= NPROTO); 27772:+ BUG_ON(len); 30411:+ BUG_ON(hctx->ccid3hctx_p && !hctx->ccid3hctx_x_calc); 30626:+ BUG_ON(hctx == NULL); 32199:+ BUG_ON(ptr == NULL); 32217:+ BUG_ON(ptr == NULL); 58250:+ BUILD_BUG_ON(sizeof(struct illinois) > ICSK_CA_PRIV_SIZE); 61747:+ BUG_ON(sizeof(struct yeah) > ICSK_CA_PRIV_SIZE); 63079:+ BUG_ON(pad < 0); 69953:+ BUG_ON(sk == NULL); 69962:+ BUG_ON(self == NULL); 70883:+ BUG_ON(destroy == NULL); 80348:+ BUG_ON(!wiphy); Don't init statics to 0/NULL: 61061:+static int port __read_mostly = 0; 70417:+static int hashbin_lock_depth = 0; 28 warnings Bad David. git-ocfs2.patch: couple fo new typedefs, zillions of BUG_ONs box:/usr/src/25> ~/checkpatch.pl patches/git-libata-all.patch Checking patches/git-libata-all.patch: signoffs = 167 Do not add new typedefs. 14867:+typedef int (*ata_prereset_fn_t)(struct ata_port *ap, unsigned long deadline); 14868:+typedef int (*ata_reset_fn_t)(struct ata_port *ap, unsigned int *classes, Use WARN_ON & Recovery code rather than BUG() and BUG_ON() 5426:+ BUG_ON(!legacy_dr); Don't init statics to 0/NULL: 2861:+static int ata_ignore_hpa = 0; box:/usr/src/25> ~/checkpatch.pl patches/git-ia64.patch Checking patches/git-ia64.patch: signoffs = 38 Do not add new typedefs. 875:+typedef unsigned long u64; 876:+typedef unsigned int u32; 878:+typedef union err_type_info_u { 890:+typedef union err_struct_info_u { 930:+typedef union err_data_buffer_u { 954:+typedef union capabilities_u { 1009:+typedef struct resources_s { 1443:+typedef struct { box:/usr/src/25>
Re: [00/17] Large Blocksize Support V3
On Fri, 27 Apr 2007, Andrew Morton wrote: > My (repeated) point is that if we populate pagecache with > physically-contiguous 4k > pages in this manner then bio+block will be able to create much larger SG > lists. True but the "if" becomes exceedingly rare the longer the system was in operation. 64k implies 16 pages in sequence. This is going to be a bit difficult to get. Then there is the overhead of handling these pages. Which may be not significant given growing processor capabilities in some usage cases. In others like a synchronized application running on a large number of nodes this is likely introduce random delays between processor to processor communication that will significantly impair performance. And then there is the long list of features that cannot be accomplished with such an approach like mounting a volume with large block size, handling CD/DVDs, getting rid of various shim layers etc. I'd also like to have much higher orders of allocations for scientific applications that require an extremely large I/O rate. For those we could f.e. dedicate memory nodes that will only use a very high page order to prevent fragmentation. E.g. 1G pages is certainly something that lots of our customers would find beneficial (and they are actually already using those types of pages in the form of huge pages but with limited capabilities). But then we are sadly again trying to find another workaround that will not get us there and will not allow the flexibility in the VM that would make things much easier for lots of usage scenarios. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 20/38] Minor fault path optimization.
Martin Schwidefsky writes: > The minor fault path has grown a lot in terms of cycles. In particular > the kprobes hook is very costly. Optimize the path to save a couple of > cycles. If kprobes is enabled more than 300 cycles can be avoided if > kprobes_running() is false. There's no good reason to use a notifier for page faults, since there's only one external piece of code that wants to know about them... Regards, Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/17] Large Blocksize Support V3
On Sat, 28 Apr 2007 13:17:40 +1000 David Chinner <[EMAIL PROTECTED]> wrote: > > Fix up your lameo HBA for reads. > > Where did that come from? You spend 20 lines described the inefficiencies > of the readahead in the page cache and it should be fixed but then you > turn around and say fix the HBA? My (repeated) point is that if we populate pagecache with physically-contiguous 4k pages in this manner then bio+block will be able to create much larger SG lists. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: commit 45cd8d8e -- why?
Andrew Morton wrote: > On Fri, 27 Apr 2007 19:50:19 -0700 Roland Dreier <[EMAIL PROTECTED]> wrote: > >> The changelog says: >> >> fs/sysfs/bin.c: In function 'read': >> fs/sysfs/bin.c:77: warning: format '%zd' expects type 'signed size_t', >> but argument 4 has type 'int' >> >> but the signature of the function read() is >> >> read(struct file * file, char __user * userbuf, size_t count, loff_t * >> off) >> >> and git blame seems to show it was always thus -- ie count was always size_t. >> >> And now on x86-64 and ia64 with gcc 4.1 at least, I get: >> >> fs/sysfs/bin.c: In function 'read': >> fs/sysfs/bin.c:62: warning: format '%d' expects type 'int', but argument >> 4 has type 'size_t' > > Some patches landed out of order. In Greg's tree (with Tejun's patches) > `count' is a local variable (not an incoming arg) of type `int'. > > So this patch was against Tejun's stuff, not against mainline. > > I'd have picked that up, but I went and assumed that it was a victim of the > new dev_dbg() printk arg checking stuff. Ho hum. Ah.. I already have this fix merged in my patch series. I'm currently testing things, so please be patient a little bit more. Thanks. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] lguest simplification: don't pin guest trap handlers
We don't actually need the Guest handlers mapped to avoid double fault, just the stack pages. Thanks to Zach for confirming. Signed-off-by: Rusty Russell <[EMAIL PROTECTED]> --- drivers/lguest/interrupts_and_traps.c | 26 +- drivers/lguest/lg.h |2 +- drivers/lguest/page_tables.c |6 +++--- 3 files changed, 5 insertions(+), 29 deletions(-) === --- a/drivers/lguest/interrupts_and_traps.c +++ b/drivers/lguest/interrupts_and_traps.c @@ -138,31 +138,12 @@ static int direct_trap(const struct lgue return idt_type(trap->a, trap->b) == 0xF; } -static void pin_stack_pages(struct lguest *lg) +void pin_stack_pages(struct lguest *lg) { unsigned int i; for (i = 0; i < lg->stack_pages; i++) pin_page(lg, lg->esp1 - i * PAGE_SIZE); -} - -/* We need to ensure all the direct trap pages are mapped after we - * clear shadow mappings. */ -void pin_trap_pages(struct lguest *lg) -{ - unsigned int i; - struct desc_struct *trap; - - for (i = 0; i < FIRST_EXTERNAL_VECTOR; i++) { - trap = >idt[i]; - if (direct_trap(lg, trap, i)) - pin_page(lg, idt_address(trap->a, trap->b)); - } - - trap = >syscall_idt; - if (direct_trap(lg, trap, SYSCALL_VECTOR)) - pin_page(lg, idt_address(trap->a, trap->b)); - pin_stack_pages(lg); } void guest_set_stack(struct lguest *lg, u32 seg, u32 esp, unsigned int pages) @@ -194,11 +175,6 @@ static void set_trap(struct lguest *lg, trap->a = ((__KERNEL_CS|GUEST_PL)<<16) | (lo&0x); trap->b = (hi&0xEF00); - - /* Make sure trap address is available so we don't fault. In -* theory, it could overlap two pages, in practice it's aligned. */ - if (direct_trap(lg, trap, num)) - pin_page(lg, idt_address(lo, hi)); } void load_guest_idt_entry(struct lguest *lg, unsigned int num, u32 lo, u32 hi) === --- a/drivers/lguest/lg.h +++ b/drivers/lguest/lg.h @@ -190,7 +190,7 @@ int deliver_trap(struct lguest *lg, unsi int deliver_trap(struct lguest *lg, unsigned int num); void load_guest_idt_entry(struct lguest *lg, unsigned int i, u32 low, u32 hi); void guest_set_stack(struct lguest *lg, u32 seg, u32 esp, unsigned int pages); -void pin_trap_pages(struct lguest *lg); +void pin_stack_pages(struct lguest *lg); void setup_default_idt_entries(struct lguest_ro_state *state, const unsigned long *def); void copy_traps(const struct lguest *lg, struct desc_struct *idt, === --- a/drivers/lguest/page_tables.c +++ b/drivers/lguest/page_tables.c @@ -186,7 +186,7 @@ void pin_page(struct lguest *lg, unsigne void pin_page(struct lguest *lg, unsigned long vaddr) { if (!page_writable(lg, vaddr) && !demand_page(lg, vaddr, 0)) - kill_guest(lg, "bad trap page %#lx", vaddr); + kill_guest(lg, "bad stack page %#lx", vaddr); } static void release_pgd(struct lguest *lg, spgd_t *spgd) @@ -253,7 +253,7 @@ void guest_new_pagetable(struct lguest * newpgdir = new_pgdir(lg, pgtable, ); lg->pgdidx = newpgdir; if (repin) - pin_trap_pages(lg); + pin_stack_pages(lg); } static void release_all_pagetables(struct lguest *lg) @@ -269,7 +269,7 @@ void guest_pagetable_clear_all(struct lg void guest_pagetable_clear_all(struct lguest *lg) { release_all_pagetables(lg); - pin_trap_pages(lg); + pin_stack_pages(lg); } static void do_set_pte(struct lguest *lg, int idx, - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: commit 45cd8d8e -- why?
On Fri, 27 Apr 2007 19:50:19 -0700 Roland Dreier <[EMAIL PROTECTED]> wrote: > The changelog says: > > fs/sysfs/bin.c: In function 'read': > fs/sysfs/bin.c:77: warning: format '%zd' expects type 'signed size_t', > but argument 4 has type 'int' > > but the signature of the function read() is > > read(struct file * file, char __user * userbuf, size_t count, loff_t * > off) > > and git blame seems to show it was always thus -- ie count was always size_t. > > And now on x86-64 and ia64 with gcc 4.1 at least, I get: > > fs/sysfs/bin.c: In function 'read': > fs/sysfs/bin.c:62: warning: format '%d' expects type 'int', but argument > 4 has type 'size_t' Some patches landed out of order. In Greg's tree (with Tejun's patches) `count' is a local variable (not an incoming arg) of type `int'. So this patch was against Tejun's stuff, not against mainline. I'd have picked that up, but I went and assumed that it was a victim of the new dev_dbg() printk arg checking stuff. Ho hum. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] MM: implement MADV_FREE lazy freeing of anonymous memory
With lazy freeing of anonymous pages through MADV_FREE, performance of the MySQL sysbench workload more than doubles on my quad-core system. Madvise with MADV_FREE is used by applications to tell the kernel that memory no longer contains useful data and can be reclaimed by the kernel if it is needed elsewhere. However, if the application puts new data in the page (dirty bit gets set by hardware), the kernel will not throw away the data. This makes applications that free() and then later on malloc() the same data again run a lot faster, since page faults are avoided. In low memory situations, the kernel still knows which pages to reclaim. "Doing it all in userspace" is not a good solution for this problem, because if the system needs the memory it is way cheaper to just throw away these freed pages than to do the disk IO of swapping them out and back in. Signed-off-by: Rik van Riel <[EMAIL PROTECTED]> --- linux-2.6.21.noarch/mm/rmap.c.madv_free 2007-04-25 23:08:32.0 -0400 +++ linux-2.6.21.noarch/mm/rmap.c 2007-04-27 16:03:22.0 -0400 @@ -656,7 +656,17 @@ static int try_to_unmap_one(struct page /* Update high watermark before we lower rss */ update_hiwater_rss(mm); - if (PageAnon(page)) { + /* MADV_FREE is used to lazily free memory from userspace. */ + if (PageLazyFree(page) && !migration) { + if (unlikely(pte_dirty(pteval))) { + /* There is new data in the page. Reinstate it. */ + set_pte_at(mm, address, pte, pteval); + ret = SWAP_FAIL; + goto out_unmap; + } + /* Throw the page away. */ + dec_mm_counter(mm, anon_rss); + } else if (PageAnon(page)) { swp_entry_t entry = { .val = page_private(page) }; if (PageSwapCache(page)) { --- linux-2.6.21.noarch/mm/page_alloc.c.madv_free 2007-04-27 16:03:22.0 -0400 +++ linux-2.6.21.noarch/mm/page_alloc.c 2007-04-27 16:03:22.0 -0400 @@ -203,6 +203,7 @@ static void bad_page(struct page *page) 1 << PG_slab| 1 << PG_swapcache | 1 << PG_writeback | + 1 << PG_lazyfree | 1 << PG_buddy ); set_page_count(page, 0); reset_page_mapcount(page); @@ -442,6 +443,8 @@ static inline int free_pages_check(struc bad_page(page); if (PageDirty(page)) __ClearPageDirty(page); + if (PageLazyFree(page)) + __ClearPageLazyFree(page); /* * For now, we report if PG_reserved was found set, but do not * clear it, and do not free the page. But we shall soon need @@ -588,6 +591,7 @@ static int prep_new_page(struct page *pa 1 << PG_swapcache | 1 << PG_writeback | 1 << PG_reserved | + 1 << PG_lazyfree | 1 << PG_buddy bad_page(page); --- linux-2.6.21.noarch/mm/memory.c.madv_free 2007-04-25 23:08:32.0 -0400 +++ linux-2.6.21.noarch/mm/memory.c 2007-04-27 21:12:57.0 -0400 @@ -432,6 +432,7 @@ copy_one_pte(struct mm_struct *dst_mm, s unsigned long vm_flags = vma->vm_flags; pte_t pte = *src_pte; struct page *page; + int dirty = 0; /* pte contains position in swap or file, so copy. */ if (unlikely(!pte_present(pte))) { @@ -466,6 +467,7 @@ copy_one_pte(struct mm_struct *dst_mm, s * in the parent and the child */ if (is_cow_mapping(vm_flags)) { + dirty = pte_dirty(pte); ptep_set_wrprotect(src_mm, addr, src_pte); pte = pte_wrprotect(pte); } @@ -483,6 +485,8 @@ copy_one_pte(struct mm_struct *dst_mm, s get_page(page); page_dup_rmap(page); rss[!!PageAnon(page)]++; + if (dirty && PageLazyFree(page)) + ClearPageLazyFree(page); } out_set_pte: @@ -661,6 +665,28 @@ static unsigned long zap_pte_range(struc (page->index < details->first_index || page->index > details->last_index)) continue; + +/* + * MADV_FREE is used to lazily recycle + * anon memory. The process no longer + * needs the data and wants to avoid IO. + */ +if (details->madv_free && PageAnon(page)) { + if (unlikely(PageSwapCache(page)) && + !TestSetPageLocked(page)) { + remove_exclusive_swap_page(page); + unlock_page(page); + } + ptep_test_and_clear_dirty(vma, addr, pte); + ptep_test_and_clear_young(vma, addr, pte); + SetPageLazyFree(page); + if (PageActive(page)) + deactivate_tail_page(page); + /* tlb_remove_page frees it again */ + get_page(page); + tlb_remove_page(tlb, page); + continue; +} } ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm); @@ -689,7 +715,8 @@ static unsigned long zap_pte_range(struc * If details->check_mapping, we leave swap entries; * if details->nonlinear_vma, we leave file entries. */ - if (unlikely(details)) + if (unlikely(details && (details->check_mapping || +details->nonlinear_vma))) continue; if (!pte_file(ptent)) free_swap_and_cache(pte_to_swp_entry(ptent)); @@ -755,7 +782,8 @@ static unsigned long unmap_page_range(st pgd_t *pgd; unsigned long next; - if (details && !details->check_mapping && !details->nonlinear_vma) + if (details && !details->check_mapping &&
Re: [ext3][kernels >= 2.6.20.7 at least] KDE going comatose when FS is under heavy write load (massive starvation)
On Fri, 2007-04-27 at 08:18 -0700, Linus Torvalds wrote: > Actually, you don't need to apply the patch - just do > > echo 5 > /proc/sys/vm/dirty_background_ratio > echo 10 > /proc/sys/vm/dirty_ratio That seems to have done the trick. Amarok and GUI aren't exactly speed demons while writeout is happening, but they are not hanging for eternities. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/33] 2.6.20-stable review
On Fri, 2007-04-27 at 08:13 -0700, Greg KH wrote: > On Fri, Apr 27, 2007 at 06:15:54PM +0800, Wu, Bryan wrote: > > > > You know for some customer's product, they want to use the stable and > > long term support kernel instead to use the latest one. > > Then they should get that support from a vendor, not from the kernel.org > releases :) > Yeah, but we are the vendor as you mentioned. -:)) If we wanna to release a kernel to customer product development, how to choose the stable version? Currently, we always followed the kernel release cycle/rules and give customer the latest stable version. Thank you Greg -Bryan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Allow __vmalloc with GFP_ATOMIC
Giridhar Pemmasani wrote: Until 2.6.19, __vmalloc with GFP_ATOMIC was possible, but __get_vm_area_node would allocate the node itself with GFP_KERNEL, causing a warning. In 2.6.19, this was "fixed" by using the same flags that were passed to __vmalloc also in __get_vm_area_node. However, __get_vm_area_node does BUG_ON(in_interrupt()) now, since vmlist_lock is obtained without disabling bottom-half's. The patch below uses bh disabled lock for vmlist_lock, so that __vmalloc can be used in interrupt context. In 2.6.21, __vmalloc with GFP_ATOMIC is used by arch/um/kernel/process.c; __vmalloc is also used in ntfs, xfs, but it is not clear to me if they use it with GFP_ATOMIC or GFP_KERNEL. Thanks, Giri Hi Giri, I'm sure I've read the reason for this one before, but when you do patches like these, can you include that reason in the changelog please? Thanks, Nick -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
Nick Piggin wrote: Rohit Seth wrote: You mean by user space? If so, then it is user space responsibility to do the appropriate operations (like flush icache in this case). No, I mean places that set PG_arch_1. flush_dcache_page. This can happen for mapped pages in write, splice, install_arg_page looks questionable, direct IO... Oh, and also ptrace! I think I was almost fooled by that attempt to flush the cache in copy_to_user_page. But that also fails if you map the underlying page with multiple virtual addresses (or processes, if the icache is not flushed on ctxsw), because those others won't have their caches flushed, right? -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Allow __vmalloc with GFP_ATOMIC
Until 2.6.19, __vmalloc with GFP_ATOMIC was possible, but __get_vm_area_node would allocate the node itself with GFP_KERNEL, causing a warning. In 2.6.19, this was "fixed" by using the same flags that were passed to __vmalloc also in __get_vm_area_node. However, __get_vm_area_node does BUG_ON(in_interrupt()) now, since vmlist_lock is obtained without disabling bottom-half's. The patch below uses bh disabled lock for vmlist_lock, so that __vmalloc can be used in interrupt context. In 2.6.21, __vmalloc with GFP_ATOMIC is used by arch/um/kernel/process.c; __vmalloc is also used in ntfs, xfs, but it is not clear to me if they use it with GFP_ATOMIC or GFP_KERNEL. Thanks, Giri Signed-off-by: Giridhar Pemmasani <[EMAIL PROTECTED]> --- --- linux-2.6.21.orig/./arch/arm/mm/ioremap.c 2007-04-25 23:08:32.0 -0400 +++ linux-2.6.21.new/./arch/arm/mm/ioremap.c2007-04-27 23:29:27.0 -0400 @@ -363,7 +363,7 @@ * all the mappings before the area can be reclaimed * by someone else. */ - write_lock(_lock); + write_lock_bh(_lock); for (p = ; (tmp = *p) ; p = >next) { if((tmp->flags & VM_IOREMAP) && (tmp->addr == addr)) { if (tmp->flags & VM_ARM_SECTION_MAPPING) { @@ -376,7 +376,7 @@ break; } } - write_unlock(_lock); + write_unlock_bh(_lock); #endif if (!section_mapping) --- linux-2.6.21.orig/./arch/i386/mm/ioremap.c 2007-04-25 23:08:32.0 -0400 +++ linux-2.6.21.new/./arch/i386/mm/ioremap.c 2007-04-27 23:29:27.0 -0400 @@ -180,12 +180,12 @@ in parallel. Reuse of the virtual address is prevented by leaving it in the global lists until we're done with it. cpa takes care of the direct mappings. */ - read_lock(_lock); + read_lock_bh(_lock); for (p = vmlist; p; p = p->next) { if (p->addr == addr) break; } - read_unlock(_lock); + read_unlock_bh(_lock); if (!p) { printk("iounmap: bad address %p\n", addr); --- linux-2.6.21.orig/./arch/x86_64/mm/ioremap.c2007-04-25 23:08:32.0 -0400 +++ linux-2.6.21.new/./arch/x86_64/mm/ioremap.c 2007-04-27 23:29:27.0 -0400 @@ -175,12 +175,12 @@ in parallel. Reuse of the virtual address is prevented by leaving it in the global lists until we're done with it. cpa takes care of the direct mappings. */ - read_lock(_lock); + read_lock_bh(_lock); for (p = vmlist; p; p = p->next) { if (p->addr == addr) break; } - read_unlock(_lock); + read_unlock_bh(_lock); if (!p) { printk("iounmap: bad address %p\n", addr); --- linux-2.6.21.orig/./fs/proc/kcore.c 2007-04-25 23:08:32.0 -0400 +++ linux-2.6.21.new/./fs/proc/kcore.c 2007-04-27 23:29:27.0 -0400 @@ -335,7 +335,7 @@ if (!elf_buf) return -ENOMEM; - read_lock(_lock); + read_lock_bh(_lock); for (m=vmlist; m && cursize; m=m->next) { unsigned long vmstart; unsigned long vmsize; @@ -363,7 +363,7 @@ memcpy(elf_buf + (vmstart - start), (char *)vmstart, vmsize); } - read_unlock(_lock); + read_unlock_bh(_lock); if (copy_to_user(buffer, elf_buf, tsz)) { kfree(elf_buf); return -EFAULT; --- linux-2.6.21.orig/./fs/proc/mmu.c 2007-04-25 23:08:32.0 -0400 +++ linux-2.6.21.new/./fs/proc/mmu.c2007-04-27 23:29:41.0 -0400 @@ -47,7 +47,7 @@ prev_end = VMALLOC_START; - read_lock(_lock); + read_lock_bh(_lock); for (vma = vmlist; vma; vma = vma->next) { unsigned long addr = (unsigned long) vma->addr; @@ -72,6 +72,6 @@ if (VMALLOC_END - prev_end > vmi->largest_chunk) vmi->largest_chunk = VMALLOC_END - prev_end; - read_unlock(_lock); + read_unlock_bh(_lock); } } --- linux-2.6.21.orig/./mm/vmalloc.c2007-04-25 23:08:32.0 -0400 +++ linux-2.6.21.new/./mm/vmalloc.c 2007-04-27 23:33:17.0 -0400 @@ -168,7 +168,7 @@ unsigned long align = 1; unsigned long addr; - BUG_ON(in_interrupt()); + BUG_ON(in_irq()); if (flags & VM_IOREMAP) { int bit = fls(size); @@ -193,7 +193,7 @@ */ size += PAGE_SIZE; - write_lock(_lock); + write_lock_bh(_lock); for (p = (tmp = *p) != NULL ;p = >next)
Re: What's in infiniband.git for 2.6.22
> What about the mthca patch to use separate HW queues for kernel > RC/UD/userspace RC? right, I'll queue that up too. BTW is there something analogous we could do for mlx4, or is FW not quite ready? - R. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[git pull] DRM patches for 2.6.22-rc1
Hi Linus, Please pull the 'drm-patches' branch of git://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6.git drm-patches This contains the drm patch for 2.6.22-rc1, and contains a number of fixes in the mmap code and the locking for AIGLX systems along with new hw support for i965GM. Dave. drivers/char/drm/README.drm| 16 +++-- drivers/char/drm/drm.h |4 +- drivers/char/drm/drmP.h| 23 +-- drivers/char/drm/drm_bufs.c| 75 +++ drivers/char/drm/drm_drv.c |9 +-- drivers/char/drm/drm_fops.c| 96 ++-- drivers/char/drm/drm_hashtab.c | 17 +- drivers/char/drm/drm_hashtab.h |1 - drivers/char/drm/drm_irq.c |4 +- drivers/char/drm/drm_lock.c| 134 --- drivers/char/drm/drm_mm.c |2 + drivers/char/drm/drm_pciids.h |3 +- drivers/char/drm/drm_proc.c|2 +- drivers/char/drm/drm_stub.c|1 - drivers/char/drm/drm_vm.c | 102 --- drivers/char/drm/i915_dma.c|3 +- drivers/char/drm/radeon_cp.c |8 +- drivers/char/drm/sis_drv.c |2 +- drivers/char/drm/via_drv.c |3 +- drivers/char/drm/via_mm.h | 40 20 files changed, 196 insertions(+), 349 deletions(-) commit ce7dd06372058f9e3e57ee4c0aeba694a43a80ad Author: Wang Zhenyu <[EMAIL PROTECTED]> Date: Thu Apr 26 07:42:56 2007 +1000 drm/i915: Add 965GM pci id update Signed-off-by: Dave Airlie <[EMAIL PROTECTED]> commit 9e9c1326a592c677c94d730fcf4446d0e275aef4 Author: Dave Airlie <[EMAIL PROTECTED]> Date: Sat Mar 24 17:57:54 2007 +1100 drm: just use io_remap_pfn_range on all archs.. Move the sparc64 ifdef around to clean this up. Signed-off-by: Dave Airlie <[EMAIL PROTECTED]> commit 38315878a560eede1a2db52e511ad3a2cfbb4206 Author: Hugh Dickins <[EMAIL PROTECTED]> Date: Sat Mar 24 17:55:16 2007 +1100 drm: fix DRM_CONSISTENT mapping This patch got lost in the DRM git tree for ages, bring it back to life. Signed-off-by: Dave Airlie <[EMAIL PROTECTED]> commit d7d8aac79dc38cbdef83b774e49bafdae9918137 Author: Thomas Hellstrom Date: Sat Mar 24 17:52:49 2007 +1100 drm: fix up mmap locking in preparation for ttm changes This change is needed to protect againt disappearing maps which aren't common. The map lists are protected using sturct_mutex but drm_mmap never locked it. Signed-off-by: Dave Airlie <[EMAIL PROTECTED]> commit 040ac32048d5efabd557c1e0a6ab8aec2c710c56 Author: Thomas Hellstrom Date: Fri Mar 23 13:28:33 2007 +1100 drm: fix driver deadlock with AIGLX and reclaim_buffers_locked Bugzilla Bug #9457 Add refcounting of user waiters to the DRM hardware lock, so that we can use DRM_LOCK_CONT flag more conservatively. Also add a kernel waiter refcount that if nonzero transfers the lock for the kernel context when it is released. This is useful when waiting for idle and can be used for very simple fence object driver implementations for the new memory manager Signed-off-by: Dave Airlie <[EMAIL PROTECTED]> commit 4b560fde06aeb342f3ff0bce924627ab722d251a Author: Andrew Morton <[EMAIL PROTECTED]> Date: Mon Mar 19 09:08:21 2007 +1100 drm: fix warning in drm_fops.c drivers/char/drm/drm_fops.c: In function 'drm_setup': drivers/char/drm/drm_fops.c:60: warning: comparison of distinct pointer types lacks a cast Unfortunately PAGE_SIZE has different types on different architectures. Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> Signed-off-by: Dave Airlie <[EMAIL PROTECTED]> commit 99da6d861c659bb1a961b70f50fad268b9ed5a5f Author: Thomas Hellstrom Date: Mon Mar 19 08:52:17 2007 +1100 drm: allow for more generic drm ioctls Signed-off-by: Dave Airlie <[EMAIL PROTECTED]> commit 6244270ef62203e057191bf85489e2ff91cc0e60 Author: Jay Estabrook <[EMAIL PROTECTED]> Date: Sun Mar 11 11:46:27 2007 +1100 drm: fix alpha domain handling Signed-off-by: Dave Airlie <[EMAIL PROTECTED]> commit 74be8e3b3707956f8f232313de9fad896d5489ac Author: Thomas Hellstrom Date: Sun Mar 11 11:45:24 2007 +1100 via: fix CX700 pci id Signed-off-by: Dave Airlie <[EMAIL PROTECTED]> commit 0bead7cdc94b4897f3d92db6170737a2da527134 Author: Adrian Bunk <[EMAIL PROTECTED]> Date: Sun Mar 11 11:41:16 2007 +1100 drm: make drm_io_prot static. This patch makes the needlessly global drm_io_prot() static. Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> Signed-off-by: Dave Airlie <[EMAIL PROTECTED]> commit 5379397182a7b5fa1c68ceaefe311ce4c1d04b2a Author: Robert P. J. Day <[EMAIL PROTECTED]> Date: Sun Mar 11 11:39:31 2007 +1100 drm: remove via_mm.h Delete apparently unused header file drivers/char/drm/via_mm.h. Signed-off-by: Robert P. J. Day <[EMAIL PROTECTED]> Signed-off-by: Dave Airlie <[EMAIL PROTECTED]> commit
Re: [00/17] Large Blocksize Support V3
On Sat, 28 Apr 2007, David Chinner wrote: > > 1-disk and 2-disk read throughput fell by an improbable amount, which makes > > me cautious about the other numbers. > > For read, yes, and it's because something is going wrong with the > I/O size - it looks like readahead thrashing of some kind even > with 4k pages tests. Yup. I seem to have a problem in that area with my patches. Somehow the nr of page is shifted by page order. I do not completely understand what is going on there yet. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: checkpatch, a patch checking script.
On Fri, Apr 27, 2007 at 08:36:17PM -0700, Roland Dreier wrote: >... > Also, it would be nice to be able to do something like > > git diff v2.6.20..|perl ~/checkpatch.pl - >... perl ~/checkpatch.pl <(git diff v2.6.20..) > - R. cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel oops with 2.6.21 while using cdda2wav & cooked_ioctl (x64-64)
Ross Alexander wrote: Modules linked in: nvidia(P) Tainted:P With this, nobody will even look at your report. Please retry without proprietary modules. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: checkpatch, a patch checking script.
> http://www.codemonkey.org.uk/projects/checkpatch/example.log shows > what fell out of running it on my mbox of lkml from the past month. > Some of them are kinda noisy, and perhaps should be moved under --pedantic > > I'm all ears for additional regexps, bug reports or other suggestions. Looks great... however I notice a few obvious false positives in the example log: > Don't init statics to 0/NULL: > 94312:+static const struct in6_addr in6addr_v4mapped = { { { [10] = 0xff, > [11] = 0xff } } }; ummm? > 137054:+static uint32_t drvr_ver = 0x02200207; that ain't zero... > 230079:+path->static_rate = 0; and that ain't a static variable. I guess trying to parse C in a regexp is a little tricky. Also, it would be nice to be able to do something like git diff v2.6.20..|perl ~/checkpatch.pl - rather than having to create a temp file -- as it stands that command produces unknown option: - usage: findbugs.pl [-options] file(s) -allsource : check entire source file, not just '+' patch lines -pedantic : TBD -style : TBD -v, --verbose : verbose -h, --help : this help text Version: 002 And even worse git diff v2.6.20..|perl ~/checkpatch.pl just silently does nothing (maybe a "no input files" warning would be a good clue for people). - R. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugme-new] [Bug 8378] New: Averatec 3156X laptop doesn't reboot with kernels > 2.6.13.5 (responsible commit found)
Andrew Morton wrote (at Fri, 27 Apr 2007 14:44:34 -0700) : > > > On Fri, 27 Apr 2007 10:42:25 -0700 > [EMAIL PROTECTED] wrote: > >> http://bugzilla.kernel.org/show_bug.cgi?id=8378 >> >>Summary: Averatec 3156X laptop doesn't reboot with kernels > >> 2.6.13.5 (responsible commit found) >> Kernel Version: 2.6.14 till 2.6.21 >> Status: NEW >> Severity: normal >> Owner: [EMAIL PROTECTED] >> Submitter: [EMAIL PROTECTED] >> >> >> Most recent kernel where this bug did *NOT* occur: 2.6.13.5 >> >> Distribution: Debian >> Hardware Environment: Averatec 3156X (seemingly identical to the american >> model >> 3150P) >> Software Environment:? >> Problem Description: >> I noticed that with recent kernels my laptop would reboot when I do an 'init >> 6', >> but hang at the end of the init run. The last working vanilla kernel is >> 2.6.13.5. With some trying and a bit of guessing I found a change to >> include/asm-i386/mach-default/mach_reboot.h in 2.6.14 to be the culprit. It >> can >> be found at: >> http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.14.y.git;a=commitdiff;h=59f4e7d572980a521b7bdba74ab71b21f5995538 >> >> On a 2.6.21 source tree I can revert this patch, and then rebooting works. >> >> Steps to reproduce: >> 1) On a Averatec 3156X (or 3150p?) boot to your default runlevel. >> 2) as root, type "init 6". >> 3) instead of rebooting, the system will hang at the end with a blank screen. >> > > Oh dear. We have an ugly i386 snafu here. Thanks for doing the bisection > - it helps enormously. > > Could some brave person please pick it up and see if we can get both > Truxton and Lee's machines working? Hi, I verified on my IDEQ210M that performing the old reboot sequence followed by the new reboot sequence works for me, and I suspect that it will work for Lee also. Like this : /* old method, works on most machines */ for (i = 0; i < 100; i++) { kb_wait(); udelay(50); outb(0xfe, 0x64); /* pulse reset low */ udelay(50); } /* new method, sets the "System flag" which when set, indicates successful completion of the keyboard controller self-test (Basic Assurance Test, BAT). This is needed for some machines with no keyboard plugged in */ for (i = 0; i < 100; i++) { kb_wait(); udelay(50); outb(0x60, 0x64); /* write Controller Command Byte */ udelay(50); kb_wait(); udelay(50); outb(0x14, 0x60); /* set "System flag" */ udelay(50); kb_wait(); udelay(50); outb(0xfe, 0x64); /* pulse reset low */ udelay(50); } Thanks, -Truxton - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/17] Large Blocksize Support V3
On Fri, Apr 27, 2007 at 12:11:08PM -0700, Andrew Morton wrote: > On Sat, 28 Apr 2007 03:34:32 +1000 David Chinner <[EMAIL PROTECTED]> wrote: > > > Some more information - stripe unit on the dm raid0 is 512k. > > I have not attempted to increase I/O sizes at all yet - these test are > > just demonstrating efficiency improvements in the filesystem. > > > > These numbers for 32GB files. > > > > READWRITE > > disks blksz tput sys tputsys > > - -- - > > 1 4k89 18s 57 44s > > 116k46 13s 67 18s > > 164k75 12s 68 12s > > 2 4k 179 20s 114 43s > > 216k55 13s 132 18s > > 264k 126 12s 126 12s > > 4 4k 350 20s 214 43s > > 416k 350 14s 264 19s > > 464k 176 11s 266 12s > > 8 4k 415 21s 446 41s > > 816k 655 13s 518 19s > > 864k 664 12s 552 12s > > 12 4k 413 20s 633 33s > > 1216k 736 14s 741 19s > > 1264k 836 12s 743 12s > > > > Throughput in MB/s. > > > > > > Consistent improvement across the write results, first time > > I've hit the limits of the PCI-X bus with a single buffered > > I/O thread doing either reads or writes. > > 1-disk and 2-disk read throughput fell by an improbable amount, which makes > me cautious about the other numbers. For read, yes, and it's because something is going wrong with the I/O size - it looks like readahead thrashing of some kind even with 4k pages tests. When when I bumped the block device readahead from 256 -> 2048, the single disk read numbers went 60, 75, 75MB/s for 4->64k block size and were repeatable, so we definitely have some interaction with readahead. > Your annotation says "blocksize". Are you really varying the fs blocksize > here, or did you mean "pagesize"? Filesystem blocksize, as specified by mkfs.xfs. Which, in turn, changes the page cache order. > What worries me here is that we have inefficient code, and increasing the > pagesize amortises that inefficiency without curing it. Increasing the filesystem block size also reduces the overhead of the filesystem, not just he page cache. A lot of the overhead (write especially) reductions are going to be filesystem block size related, so I wouldn't start assuming that it's just he page cache changes that have brought about these system time reductions. > If so, it would be better to fix the inefficiencies, so that 4k pagesize > will also benefit. > > For example, see __do_page_cache_readahead(). It does a read_lock() and a > page allocation and a radix-tree lookup for each page. We can vastly > improve that. Sure but that's a different problem to what we are trying to solve now. Even with this in place, I think we'd still realise improvements with the compound pages > Fix up your lameo HBA for reads. Where did that come from? You spend 20 lines described the inefficiencies of the readahead in the page cache and it should be fixed but then you turn around and say fix the HBA? This test was constructed to keep the I/o sizes within the current bounds, so the HBA sees no difference in I/O sizes as the filesystem block size changes. i.e. the HBA is constant factor during the tests. IOWs, the changes in numbers above are purely a result of the page cache and filesystem changes And besides, the "lameo HBA" I'm using is cleared limited by the PCI-X bus it's on, not the size and type of pages being thrown at it by the I/O layers. The hardware is pretty much irrelevant in these tests Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/17] Large Blocksize Support V3
William Lee Irwin III wrote: >> What sort of strategy do you intend to use to speculatively populate >> the pagecache with contiguous pages? On Sat, Apr 28, 2007 at 12:50:26PM +1000, Nick Piggin wrote: > Andrew outlined it. I'd like to suggest a few straightforward additions to the proposal: (1) the interface to the page allocator tries to allocate N pages where (a) N is a power of 2 (b) some effort is made to get contiguity (c) some effort is made to fall back to lesser contiguity (d) some effort is made to get N pages even with no contiguity (2) a corresponding group freeing interface to the page allocator (3) Pass the pages around in a list or similar so that O(1) instead of O(pages) splice operations under the lock suffice for passing them around. Dissecting compound pages outside locks helps. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -rt] yet another irq storm
Must be global warming, I'm getting a lot more irq storms than usual. Now that I switched over to x86_64, I booted up and got another irq storm. So I added my previous patch and it didn't fix it. Looking further, I found that the mask and unmask is done directly in the x86_64/io_apic.c file. This patch does basically the same thing as my previous patch, but to the x86_64/io_apic.c file. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Index: linux-2.6.21-rt1/arch/x86_64/kernel/io_apic.c === --- linux-2.6.21-rt1.orig/arch/x86_64/kernel/io_apic.c +++ linux-2.6.21-rt1/arch/x86_64/kernel/io_apic.c @@ -1437,7 +1437,8 @@ static void ack_apic_level(unsigned int irq_complete_move(irq); #if defined(CONFIG_GENERIC_PENDING_IRQ) || defined(CONFIG_IRQBALANCE) /* If we are moving the irq we need to mask it */ - if (unlikely(irq_desc[irq].status & IRQ_MOVE_PENDING)) { + if (unlikely(irq_desc[irq].status & IRQ_MOVE_PENDING) && + !(irq_desc[irq].status & IRQ_INPROGRESS)) { do_unmask_irq = 1; mask_IO_APIC_irq(irq); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
checkpatch, a patch checking script.
On Wed, Apr 25, 2007 at 08:02:07PM -0700, Andrew Morton wrote: > > Yep, I was going to mention your scripts but you beat me to it. > > > > I'll be glad to help maintain such animals if wanted. > > > wanted ;) > > At least, it would be interesting to investigate the usefulness. I suspect > it will prove to be very useful for the little things. Randy and I got together and hashed out a first cut at this. (Randy actually gutted quite a lot of what I originally wrote, so deserves much kudos for improving this beyond my initial crappy version). You can find the script at http://www.codemonkey.org.uk/projects/checkpatch/ There's also a git clonable tree there (only http right now). http://www.codemonkey.org.uk/projects/checkpatch/example.log shows what fell out of running it on my mbox of lkml from the past month. Some of them are kinda noisy, and perhaps should be moved under --pedantic I'm all ears for additional regexps, bug reports or other suggestions. Before wiring this up to a procmail rule to scan every patch, I think it's probably a better idea to flesh it out a bit more. Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
Nick Piggin wrote: What if you were to say remove all the PG_arch_1 code, and do something really simple like flush icache in flush_dcache_page? Would performance suffer horribly? OIC, you need a virtual address to evict the icache, so you can't flush at flush_dcache time? Or does ia64 have an instruction to flush the whole icache? (it would be worth testing, to see how much performance suffers). -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Friday 27 April 2007 21:44:48 Rafael J. Wysocki wrote: > On Saturday, 28 April 2007 03:12, Linus Torvalds wrote: > > On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > > > > It's doubly bad, because that idiocy has also infected s2ram. Again, > > > > another thing that really makes no sense at all - and we do it not > > > > just for snapshotting, but for s2ram too. Can you tell me *why*? > > > > > > Why we freeze tasks at all or why we freeze kernel threads? > > > > In many ways, "at all". > > > > I _do_ realize the IO request queue issues, and that we cannot actually > > do s2ram with some devices in the middle of a DMA. So we want to be able > > to avoid *that*, there's no question about that. And I suspect that > > stopping user threads and then waiting for a sync is practically one of > > the easier ways to do so. > > Apparently I *CANNOT* wrap my head around this - if just because my laptop, running a vendor 2.6.17 kernel does s2ram perfectly, at least, it does when using the "Upstart" init system rather than the classical SysV init system. I have tried it with the classical init and the suspend isn't triggered by the buttons that used to do it. I didn't try 'echo ram > /sys/power/state', but I have a feeling that would have worked as well. I have problems with s2disk, but thats because I keep my swap partition small - I try to keep it at or around 256M when I have more than half a gig of Ram in a system. Perhaps one of these days I'll grab a multi-gig flash disk, set it up as a swap partition and try it again. (every time I've tried s2disk I wind up running out of disk space - and this is with nothing but X running. Any kind of progress meter for when the system is doing s2disk would be nice - every time I've tried it all I see for the nearly 2 minutes before the s2disk attempt ends is a black screen. I say 2 minutes because thats how long it takes for it to learn that there isn't enough space on the swap-partition to save the image) DRH - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
commit 45cd8d8e -- why?
The changelog says: fs/sysfs/bin.c: In function 'read': fs/sysfs/bin.c:77: warning: format '%zd' expects type 'signed size_t', but argument 4 has type 'int' but the signature of the function read() is read(struct file * file, char __user * userbuf, size_t count, loff_t * off) and git blame seems to show it was always thus -- ie count was always size_t. And now on x86-64 and ia64 with gcc 4.1 at least, I get: fs/sysfs/bin.c: In function 'read': fs/sysfs/bin.c:62: warning: format '%d' expects type 'int', but argument 4 has type 'size_t' Andrew, what compiler were you using to get that warning? Should we revert commit 45cd8d8e? - R. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/17] Large Blocksize Support V3
William Lee Irwin III wrote: On Sat, Apr 28, 2007 at 12:27:45PM +1000, Nick Piggin wrote: I guess 10% isn't a small amount. Though it would be nice to have before/after numbers for Linux. And, like Andrew was saying, we could just _attempt_ to put contiguous pages in pagecache rather than _require_ it. Which is still robust under fragmentation, and benefits everyone, not just files with a large pagecache size. What sort of strategy do you intend to use to speculatively populate the pagecache with contiguous pages? Andrew outlined it. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/17] Large Blocksize Support V3
On Sat, Apr 28, 2007 at 12:27:45PM +1000, Nick Piggin wrote: > I guess 10% isn't a small amount. Though it would be nice to have > before/after numbers for Linux. And, like Andrew was saying, we could > just _attempt_ to put contiguous pages in pagecache rather than > _require_ it. Which is still robust under fragmentation, and benefits > everyone, not just files with a large pagecache size. What sort of strategy do you intend to use to speculatively populate the pagecache with contiguous pages? -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/17] Large Blocksize Support V3
Christoph Hellwig wrote: On Fri, Apr 27, 2007 at 10:25:44PM +1000, Nick Piggin wrote: Linus's favourite jokes about powerpc mmu being crippled forever, aside ;) Different mmu. The desktop 32bit mmu Linus refered to has almost nothing in common with the mmu on 64bit systems. Well I wasn't trying to make a point there so it isn't a big deal... but he has known to say the 64-bit hash table is insane or broken. If he's since recanted, I'd be interested to read the post :) Right this could help but it is not addressing the basic requirement for devices that need large contiguuos chunks of memory for I/O. Did you read the last paragraph? Or anything Andrew's been writing? "After that, I'd find it amusing if HBAs worth thousands of $ have trouble looking up sglists at the relatively glacial pace that IO requires, and/or can't spare a few more K for reasonable sglist sizes, but if that is really the case, then we could use iommus and/or just attempt to put physically contiguous pages in pagecache, rather than require it." Real highend HBAs don't have that problem. But for example aacraid which is very common on mid-end servers is a _lot_ faster when it gets continous memory. Some benchmark was 10 or more percent faster on windows due to this. And that wasn't due to the 128 sg limit? I guess 10% isn't a small amount. Though it would be nice to have before/after numbers for Linux. And, like Andrew was saying, we could just _attempt_ to put contiguous pages in pagecache rather than _require_ it. Which is still robust under fragmentation, and benefits everyone, not just files with a large pagecache size. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Add kvasprintf()
Add a kvasprintf() function to compliment kasprintf(). [ No in-tree users yet, but I have some coming up. ] Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> Cc: Andrew Morton <[EMAIL PROTECTED]> Cc: Keir Fraser <[EMAIL PROTECTED]> --- include/linux/kernel.h |1 + lib/vsprintf.c | 28 2 files changed, 21 insertions(+), 8 deletions(-) === --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -121,6 +121,7 @@ extern int vscnprintf(char *buf, size_t __attribute__ ((format (printf, 3, 0))); extern char *kasprintf(gfp_t gfp, const char *fmt, ...) __attribute__ ((format (printf, 2, 3))); +extern char *kvasprintf(gfp_t gfp, const char *fmt, va_list args); extern int sscanf(const char *, const char *, ...) __attribute__ ((format (scanf, 2, 3))); === --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -851,22 +851,34 @@ EXPORT_SYMBOL(sscanf); /* Simplified asprintf. */ -char *kasprintf(gfp_t gfp, const char *fmt, ...) -{ - va_list ap; +char *kvasprintf(gfp_t gfp, const char *fmt, va_list ap) +{ unsigned int len; char *p; - - va_start(ap, fmt); - len = vsnprintf(NULL, 0, fmt, ap); - va_end(ap); + va_list aq; + + va_copy(aq, ap); + len = vsnprintf(NULL, 0, fmt, aq); + va_end(aq); p = kmalloc(len+1, gfp); if (!p) return NULL; + + vsnprintf(p, len+1, fmt, ap); + + return p; +} + +char *kasprintf(gfp_t gfp, const char *fmt, ...) +{ + va_list ap; + char *p; + va_start(ap, fmt); - vsnprintf(p, len+1, fmt, ap); + p = kvasprintf(gfp, fmt, ap); va_end(ap); + return p; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X display shift with disabled console blanking
On Fri, 2007-04-27 at 18:08 +0100, James Pearson wrote: > I have a problem whereby the X display 'shifts' to left when anything > writes to /dev/console - where console screen blanking has been disabled > i.e. doing something like: > > boot to run level 3 > > If not root, then make sure /dev/console is writeable > > login and type: > > setterm -blank 0 > > start X > > type into an xterm: > > echo "some random text" > /dev/console > (may have to repeat the echo above a few times) > > ... and the whole X display jumps (and wraps) to the left > > I'm using a RHEL4 based distro with a vanilla 2.6.21 x86_64 kernel > (although I've seen the problem with various x86_64 and i686 2.6.X kernels). > > I've seen this problem on a number of different nVidia cards - using > the vesa driver (same problem occurs with nVidia's binary driver). I > haven't tried using other makes of graphics cards. > > > OK, this may be a strange combination of disabling the text console > blanking and running X, but something isn't right somewhere ... Yep, it's strange because I can't reproduce this. And the console write should not succeed if the current console is in KD_GRAPHICS mode, which is done by X (unless your version is different). > > Any ideas? I don't. But, what is your current console? Is it VGA, or framebuffer? Can you try doing this again in both VGA and vesafb? And this does not happen if there is no previous setterm -blank 0 command? Tony - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] use elfnote.h to generate vsyscall notes
Use existing elfnote.h to generate vsyscall notes, rather than doing it locally. Changes elfnote.h a bit to suite, since this is the first asm user, and it wasn't quite right. Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> Cc: "Eric W. Biederman" <[EMAIL PROTECTED]> Cc: Roland McGrath <[EMAIL PROTECTED]> --- arch/i386/kernel/vsyscall-note.S | 23 ++- include/linux/elfnote.h | 18 +- 2 files changed, 19 insertions(+), 22 deletions(-) === --- a/arch/i386/kernel/vsyscall-note.S +++ b/arch/i386/kernel/vsyscall-note.S @@ -3,23 +3,12 @@ * Here we can supply some information useful to userland. */ -#include #include +#include -#define ASM_ELF_NOTE_BEGIN(name, flags, vendor, type)\ - .section name, flags; \ - .balign 4;\ - .long 1f - 0f; /* name length */ \ - .long 3f - 2f; /* data length */ \ - .long type; /* note type */ \ -0: .asciz vendor; /* vendor name */ \ -1: .balign 4;\ -2: - -#define ASM_ELF_NOTE_END \ -3: .balign 4; /* pad out section */ \ - .previous - - ASM_ELF_NOTE_BEGIN(".note.kernel-version", "a", UTS_SYSNAME, 0) +/* Ideally this would use UTS_NAME, but using a quoted string here + doesn't work. Remember to change this when changing the + kernel's name. */ +ELFNOTE_START(Linux, 0, "a") .long LINUX_VERSION_CODE - ASM_ELF_NOTE_END +ELFNOTE_END === --- a/include/linux/elfnote.h +++ b/include/linux/elfnote.h @@ -38,17 +38,25 @@ * e.g. ELFNOTE(XYZCo, 42, .asciz, "forty-two") * ELFNOTE(XYZCo, 12, .long, 0xdeadbeef) */ -#define ELFNOTE(name, type, desctype, descdata)\ -.pushsection .note.name, "",@note ; \ +#define ELFNOTE_START(name, type, flags) \ +.pushsection .note.name, flags,@note ; \ .align 4 ; \ .long 2f - 1f/* namesz */; \ - .long 4f - 3f/* descsz */; \ + .long 4484f - 3f /* descsz */; \ .long type ; \ 1:.asciz #name ; \ 2:.align 4 ; \ -3:desctype descdata; \ -4:.align 4 ; \ +3: + +#define ELFNOTE_END\ +4484:.align 4 ; \ .popsection; + +#define ELFNOTE(name, type, desc) \ + ELFNOTE_START(name, type, "") \ + desc; \ + ELFNOTE_END + #else /* !__ASSEMBLER__ */ #include /* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Fabric7 VIOC driver going away
It looks like Fabric7 has gone out of business, and the maintainer works elsewhere, so I'm no longer inclined to merge it into the upstream kernel. Yell now, if there is a contigent of Fabric7 users that still want this. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/17] Large Blocksize Support V3
On Thu, Apr 26, 2007 at 11:55:42PM -0700, Andrew Morton wrote: >>> Please address my point: if in five years time x86 has larger or varible >>> pagesize, this code will be a permanent millstone around our necks which we >>> *should not have merged*. >>> And if in five years time x86 does not have larger pagesize support then >>> the manufacturers would have decided that 4k pages are not a performance >>> problem, so we again should not have merged this code. On Fri, 27 Apr 2007 06:44:51 -0700 William Lee Irwin III <[EMAIL PROTECTED]> wrote: >> So the verdict is wait 5 years, see if x86 did anything, and so on. On Fri, Apr 27, 2007 at 12:15:57PM -0700, Andrew Morton wrote: > You missed the bit about "evaluate alternatives". No worries. I'm used to being on the wrong side of things. I'll have no trouble picking out the alternative least likely to be accepted. ;) -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] deflate inflate_dynamic too
inflate_dynamic() has piggy stack usage too, so heap allocate it too. I'm not sure it actually gets used, but it shows up large in "make checkstack". Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> --- lib/inflate.c | 63 ++--- 1 file changed, 42 insertions(+), 21 deletions(-) === --- a/lib/inflate.c +++ b/lib/inflate.c @@ -798,15 +798,18 @@ STATIC int noinline INIT inflate_dynamic unsigned nb; /* number of bit length codes */ unsigned nl; /* number of literal/length codes */ unsigned nd; /* number of distance codes */ -#ifdef PKZIP_BUG_WORKAROUND - unsigned ll[288+32]; /* literal/length and distance code lengths */ -#else - unsigned ll[286+30]; /* literal/length and distance code lengths */ -#endif + unsigned *ll; /* literal/length and distance code lengths */ register ulg b; /* bit buffer */ register unsigned k; /* number of bits in bit buffer */ + int ret; DEBG(" 286 || nd > 30) #endif -return 1; /* bad lengths */ + { +ret = 1; /* bad lengths */ +goto out; + } DEBG("dyn1 "); @@ -850,7 +856,8 @@ DEBG("dyn2 "); { if (i == 1) huft_free(tl); -return i; /* incomplete code set */ +ret = i; /* incomplete code set */ +goto out; } DEBG("dyn3 "); @@ -872,8 +879,10 @@ DEBG("dyn3 "); NEEDBITS(2) j = 3 + ((unsigned)b & 3); DUMPBITS(2) - if ((unsigned)i + j > n) -return 1; + if ((unsigned)i + j > n) { +ret = 1; + goto out; + } while (j--) ll[i++] = l; } @@ -882,8 +891,10 @@ DEBG("dyn3 "); NEEDBITS(3) j = 3 + ((unsigned)b & 7); DUMPBITS(3) - if ((unsigned)i + j > n) -return 1; + if ((unsigned)i + j > n) { +ret = 1; + goto out; + } while (j--) ll[i++] = 0; l = 0; @@ -893,8 +904,10 @@ DEBG("dyn3 "); NEEDBITS(7) j = 11 + ((unsigned)b & 0x7f); DUMPBITS(7) - if ((unsigned)i + j > n) -return 1; + if ((unsigned)i + j > n) { +ret = 1; + goto out; + } while (j--) ll[i++] = 0; l = 0; @@ -923,7 +936,8 @@ DEBG("dyn5b "); error("incomplete literal tree"); huft_free(tl); } -return i; /* incomplete code set */ +ret = i; /* incomplete code set */ +goto out; } DEBG("dyn5c "); bd = dbits; @@ -939,15 +953,18 @@ DEBG("dyn5d "); huft_free(td); } huft_free(tl); -return i; /* incomplete code set */ +ret = i; /* incomplete code set */ +goto out; #endif } DEBG("dyn6 "); /* decompress until an end-of-block code */ - if (inflate_codes(tl, td, bl, bd)) -return 1; + if (inflate_codes(tl, td, bl, bd)) { +ret = 1; +goto out; + } DEBG("dyn7 "); @@ -956,10 +973,14 @@ DEBG("dyn7 "); huft_free(td); DEBG(">"); - return 0; - - underrun: - return 4;/* Input underrun */ + ret = 0; +out: + free(ll); + return ret; + +underrun: + ret = 4; /* Input underrun */ + goto out; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86: Add a sched_clock paravirt_op
The tsc-based get_scheduled_cycles interface is not a good match for Xen's runstate accounting, which reports everything in nanoseconds. This patch replaces this interface with a sched_clock interface, which matches both Xen and VMI's requirements. In order to do this, we: 1. replace get_scheduled_cycles with sched_clock 2. hoist cycles_2_ns into a common header 3. update vmi accordingly One thing to note: because sched_clock is implemented as a weak function in kernel/sched.c, we must define a real function in order to override this weak binding. This means the usual paravirt_ops technique of using an inline function won't work in this case. [ This is against Andi's patch queue. It fixes the x86-64 build problem. ] Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> Cc: Zachary Amsden <[EMAIL PROTECTED]> Cc: Dan Hecht <[EMAIL PROTECTED]> Cc: john stultz <[EMAIL PROTECTED]> --- arch/i386/kernel/paravirt.c|2 - arch/i386/kernel/sched-clock.c | 43 +-- arch/i386/kernel/vmi.c |2 - arch/i386/kernel/vmiclock.c|6 ++-- include/asm-i386/paravirt.h|7 - include/asm-i386/sched-clock.h | 49 include/asm-i386/timer.h |2 - include/asm-i386/vmi_time.h|2 - include/asm-x86_64/timer.h |2 - 9 files changed, 79 insertions(+), 36 deletions(-) === --- a/arch/i386/kernel/paravirt.c +++ b/arch/i386/kernel/paravirt.c @@ -268,7 +268,7 @@ struct paravirt_ops paravirt_ops = { .write_msr = native_write_msr_safe, .read_tsc = native_read_tsc, .read_pmc = native_read_pmc, - .get_scheduled_cycles = native_read_tsc, + .sched_clock = native_sched_clock, .get_cpu_khz = native_calculate_cpu_khz, .load_tr_desc = native_load_tr_desc, .set_ldt = native_set_ldt, === --- a/arch/i386/kernel/sched-clock.c +++ b/arch/i386/kernel/sched-clock.c @@ -35,29 +35,8 @@ * [EMAIL PROTECTED] "math is hard, lets go shopping!" */ -#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */ - -struct sc_data { - unsigned cyc2ns_scale; - unsigned unstable; - unsigned long long sync_base; /* TSC or jiffies at syncpoint*/ - unsigned long long ns_base; /* nanoseconds at sync point */ - unsigned long long last_val;/* Last returned value */ -}; - -static DEFINE_PER_CPU(struct sc_data, sc_data) = +DEFINE_PER_CPU(struct sc_data, sc_data) = { .unstable = 1, .sync_base = INITIAL_JIFFIES }; - -static inline u64 cycles_2_ns(struct sc_data *sc, u64 cyc) -{ - u64 ns; - - cyc -= sc->sync_base; - ns = (cyc * sc->cyc2ns_scale) >> CYC2NS_SCALE_FACTOR; - ns += sc->ns_base; - - return ns; -} /* * Scheduler clock - returns current time in nanosec units. @@ -79,7 +58,7 @@ static inline u64 cycles_2_ns(struct sc_ * per CPU. This state is protected against parallel state changes * with interrupts off. */ -unsigned long long sched_clock(void) +unsigned long long native_sched_clock(void) { unsigned long long r; struct sc_data *sc = _cpu_var(sc_data); @@ -98,8 +77,8 @@ unsigned long long sched_clock(void) sc->last_val = r; local_irq_restore(flags); } else { - get_scheduled_cycles(r); - r = cycles_2_ns(sc, r); + rdtscll(r); + r = cycles_2_ns(r); sc->last_val = r; } @@ -107,6 +86,18 @@ unsigned long long sched_clock(void) return r; } + +/* We need to define a real function for sched_clock, to override the + weak default version */ +#ifdef CONFIG_PARAVIRT +unsigned long long sched_clock(void) +{ + return paravirt_sched_clock(); +} +#else +unsigned long long sched_clock(void) + __attribute__((alias("native_sched_clock"))); +#endif /* Resync with new CPU frequency */ static void resync_sc_freq(struct sc_data *sc, unsigned int newfreq) @@ -124,7 +115,7 @@ static void resync_sc_freq(struct sc_dat because sched_clock callers should be able to tolerate small errors. */ sc->ns_base = ktime_to_ns(ktime_get()); - get_scheduled_cycles(sc->sync_base); + rdtscll(sc->sync_base); sc->cyc2ns_scale = (100 << CYC2NS_SCALE_FACTOR) / newfreq; } === --- a/arch/i386/kernel/vmi.c +++ b/arch/i386/kernel/vmi.c @@ -890,7 +890,7 @@ static inline int __init activate_vmi(vo paravirt_ops.setup_boot_clock = vmi_time_bsp_init; paravirt_ops.setup_secondary_clock = vmi_time_ap_init; #endif - paravirt_ops.get_scheduled_cycles = vmi_get_sched_cycles; + paravirt_ops.sched_clock = vmi_sched_clock;
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
Hugh Dickins wrote: On Fri, 27 Apr 2007, Nick Piggin wrote: But that's because of ia64's cache coherency implementation. I don't really follow the documentation to know whether it should be one way or the other, but surely it should be done either before or after the set_pte_at, not both. Anyway, how about fremap or mprotect, for example? ... OK, I'm still not sure that I understand why lazy_mmu_prot_update should be used rather than flush_icache_page (in concept, not ia64 implementation). Sure, flush_icache_page isn't given the pte, but let's assume we can change that. You're asking lots of good questions. I wish the ia64 people would know the answers, but from the length of time the "lazy_mmu_prot_update" stuff took to get into the tree, and the length of time it's taken to be found defective, I suspect they don't, and we'll have to guess for them. Some guesses I'm working with... I presume Mike and Anil are correct, that it needs to be done before putting pte into page table, not left until after: but as you've guessed, that needs to be done everywhere, not just in the two places so far identified. When it was discussed last year (in connection with Peter's page cleaning patches) it was thought to be a variant of update_mmu_cache() (after setting pte), and we added the fremap one to accompany it; but now it looks to be a variant of flush_icache_page() (before setting pte). Right. I think. I believe lazy_mmu_prot_update(pteval) came into existence primarily for mprotect's change_pte_range() case. If ia64 filled in its flush_icache_page(vma, page), that could have been used there (checking 'vm_flags & VM_EXEC' instead of pte_exec): but that would involve a relatively expensive(?) pte_page() in a place which doesn't need to know the struct page for other cases. Well, I think we could always add a pte argument to flush_icache_page... Then, there might be logic to have a flush_lazy_icache_page when changing protections, but that operation (currently called lazy_mmu_prot_update) really doesn't seem like it should be called in all the other places that it is, flush_icache_page should be used for that. But AFAIKS, if we really want correctness, flush_icache_page should go away and be implemented in flush_dcache_page. Well, not pte_page(), it needs to be vm_normal_page() doesn't it? and ia64's current lazy_mmu_prot_update is unsafe when !pfn_valid. Some flush_icache_pages are already in place, others are not: do we need to add some? But those architectures which have a non-empty flush_icache_page seem to have survived without the additional calls - so they might be unnecessarily slowed down by additional calls. Well flush_icache seems to be intended solely to bring icache in sync with dcache modifications, but they try to skimp out on most of the flushes required to handle dcache aliases... but really, I don't think that is possible to do 100% correctly. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86: fix PSE pagetable construction
When constructing the initial pagetable in pagetable_init, make sure that non-PSE pmds are updated to PSE ones. This fixes a bug in the paravirt pagetable init code, which otherwise tries to avoid overwrite existing mappings. This moves the definition of pmd_huge() out of the hugetlbfs files into pgtable.h. [ I know Eric would like to make larger changes to the way pagetable init works, but this patch is the minimal fix to an existing bug. ] Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> Cc: "H. Peter Anvin" <[EMAIL PROTECTED]> Cc: Eric W. Biederman <[EMAIL PROTECTED]> --- arch/i386/mm/hugetlbpage.c |6 +- arch/i386/mm/init.c |2 +- include/asm-i386/pgtable.h |2 +- include/asm-x86_64/pgtable.h |1 + include/linux/hugetlb.h |2 -- 5 files changed, 4 insertions(+), 9 deletions(-) === --- a/arch/i386/mm/hugetlbpage.c +++ b/arch/i386/mm/hugetlbpage.c @@ -183,6 +183,7 @@ follow_huge_addr(struct mm_struct *mm, u return page; } +#undef pmd_huge int pmd_huge(pmd_t pmd) { return 0; @@ -201,11 +202,6 @@ follow_huge_addr(struct mm_struct *mm, u follow_huge_addr(struct mm_struct *mm, unsigned long address, int write) { return ERR_PTR(-EINVAL); -} - -int pmd_huge(pmd_t pmd) -{ - return !!(pmd_val(pmd) & _PAGE_PSE); } struct page * === --- a/arch/i386/mm/init.c +++ b/arch/i386/mm/init.c @@ -172,7 +172,7 @@ static void __init kernel_physical_mappi /* Map with big pages if possible, otherwise create normal page tables. */ if (cpu_has_pse) { unsigned int address2 = (pfn + PTRS_PER_PTE - 1) * PAGE_SIZE + PAGE_OFFSET + PAGE_SIZE-1; - if (!pmd_present(*pmd)) { + if (!pmd_present(*pmd) || !pmd_huge(*pmd)) { if (is_kernel_text(address) || is_kernel_text(address2)) set_pmd(pmd, pfn_pmd(pfn, PAGE_KERNEL_LARGE_EXEC)); else === --- a/include/asm-i386/pgtable.h +++ b/include/asm-i386/pgtable.h @@ -211,7 +211,7 @@ extern unsigned long pg0[]; #define pmd_none(x)(!(unsigned long)pmd_val(x)) #define pmd_present(x) (pmd_val(x) & _PAGE_PRESENT) #definepmd_bad(x) ((pmd_val(x) & (~PAGE_MASK & ~_PAGE_USER)) != _KERNPG_TABLE) - +#define pmd_huge(x)((pmd_val(x) & _PAGE_PSE) != 0) #define pages_to_mb(x) ((x) >> (20-PAGE_SHIFT)) === --- a/include/asm-x86_64/pgtable.h +++ b/include/asm-x86_64/pgtable.h @@ -352,6 +352,7 @@ static inline int pmd_large(pmd_t pte) { pmd_index(address)) #define pmd_none(x)(!pmd_val(x)) #define pmd_present(x) (pmd_val(x) & _PAGE_PRESENT) +#define pmd_huge(x)((pmd_val(x) & _PAGE_PSE) != 0) #define pmd_clear(xp) do { set_pmd(xp, __pmd(0)); } while (0) #define pfn_pmd(nr,prot) (__pmd(((nr) << PAGE_SHIFT) | pgprot_val(prot))) #define pmd_pfn(x) ((pmd_val(x) & __PHYSICAL_MASK) >> PAGE_SHIFT) === --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -41,7 +41,6 @@ struct page *follow_huge_addr(struct mm_ int write); struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address, pmd_t *pmd, int write); -int pmd_huge(pmd_t pmd); void hugetlb_change_protection(struct vm_area_struct *vma, unsigned long address, unsigned long end, pgprot_t newprot); @@ -114,7 +113,6 @@ static inline unsigned long hugetlb_tota #define hugetlb_report_node_meminfo(n, buf)0 #define follow_huge_pmd(mm, addr, pmd, write) NULL #define prepare_hugepage_range(addr,len,pgoff) (-EINVAL) -#define pmd_huge(x)0 #define is_hugepage_only_range(mm, addr, len) 0 #define hugetlb_free_pgd_range(tlb, addr, end, floor, ceiling) ({BUG(); 0; }) #define hugetlb_fault(mm, vma, addr, write)({ BUG(); 0; }) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ide-cs: recognize 2GB CompactFlash from Transcend
On Fri, Apr 27, 2007 at 07:01:43PM -0700, Andrew Morton wrote: > This one-liner is turning into a fiasco. > diff -puN > drivers/ide/legacy/ide-cs.c~ide-cs-recognize-2gb-compactflash-from-transcend > drivers/ide/legacy/ide-cs.c > --- > a/drivers/ide/legacy/ide-cs.c~ide-cs-recognize-2gb-compactflash-from-transcend > +++ a/drivers/ide/legacy/ide-cs.c > @@ -401,6 +401,8 @@ static struct pcmcia_device_id ide_ids[] > PCMCIA_DEVICE_PROD_ID12("TOSHIBA", "MK2001MPL", 0xb4585a1a, 0x3489e003), > PCMCIA_DEVICE_PROD_ID1("TRANSCEND512M ", 0xd0909443), > PCMCIA_DEVICE_PROD_ID12("TRANSCEND", "TS1GCF80", 0x709b1bf1, > 0x2a54d4b1), > + PCMCIA_DEVICE_PROD_ID12("TRANSCEND", "TS2GCF120", 0x709b1bf1, > 0xf54a91c8), > + PCMCIA_DEVICE_PROD_ID12("TRANSCEND", "TS2GCF120", 0x709b1bf1, > 0x969aa4f2), > PCMCIA_DEVICE_PROD_ID12("TRANSCEND", "TS4GCF120", 0x709b1bf1, > 0xf54a91c8), > PCMCIA_DEVICE_PROD_ID12("WIT", "IDE16", 0x244e5994, 0x3e232852), > PCMCIA_DEVICE_PROD_ID12("WEIDA", "TWTTI", 0xcc7cf69c, 0x212bb918), > _ > > > Is this really supposed to add a TS2GCF120 entry with the same IDs > as TS4GCF120? That's probably a copy and paste error. 0x969aa4f2 is the correct ID. > And pata_pcmcia-recognize-2gb-compactflash-from-transcend.patch: This one is all right so for what it's worth, it gets: Acked-by: Peter Stuge <[EMAIL PROTECTED]> //Peter - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] h8300 generic irq
On Thu, 26 Apr 2007 17:34:37 +0900 Yoshinori Sato <[EMAIL PROTECTED]> wrote: > h8300 using generic irq handler patch. > > Signed-off-by: Yoshinori Sato <[EMAIL PROTECTED]> > Minor things: > > --- /dev/null > +++ b/arch/h8300/kernel/irq.c > @@ -0,0 +1,211 @@ > +/* > + * linux/arch/h8300/kernel/irq.c > + * > + * Copyright 2007 Yoshinori Sato <[EMAIL PROTECTED]> > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > +#include > +#include > +#include > + > +/*#define DEBUG*/ > + > +extern unsigned long *interrupt_redirect_table; > +extern const int h8300_saved_vectors[]; > +extern const unsigned long h8300_trap_table[]; > +int h8300_enable_irq_pin(unsigned int irq); > +void h8300_disable_irq_pin(unsigned int irq); Please always avoid putting extern declarations into C files. Please them in a header file which is visible tot he definition site asw well as all callers/users. For something which is defined in assembly code (like interrupt_redirect_table) it isn't so clear, because we cannot do typechecking. But I think it's still best to include the declaration in a header file so that we only have to declare it once. Plus it _is_ a global symbol. > + > +/* > + * h8300 interrupt controler implementation > + */ > +struct irq_chip h8300irq_chip = { > + .name = "H8300-INTC", > + .startup= h8300_startup_irq, > + .shutdown = h8300_shutdown_irq, > + .enable = h8300_enable_irq, > + .disable= h8300_disable_irq, > + .ack= NULL, > + .end= h8300_end_irq, > +}; I think this could have static scope. > +void ack_bad_irq(unsigned int irq) > +{ > + printk("unexpected IRQ trap at vector %02x\n", irq); > +} printks should generally have facility levels (KERN_*) > + panic("interrupt vector serup failed."); typo > + for ( i = 0; i < NR_IRQS; i++) { for (i = 0 > + if (i == *saved_vector) { > + ramvec_p++; > + saved_vector++; > + } else { > + if ( i < NR_TRAPS ) { if (i < NR_TRAPS) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ide-cs: recognize 2GB CompactFlash from Transcend
On Thu, 26 Apr 2007 11:21:01 +0200 "Aeschbacher, Fabrice" <[EMAIL PROTECTED]> wrote: > As pointed to by Peter, and also as indicated by a judicious output in > dmesg, the 4th parameter should be 0x969aa4f2. Please find below the > corrected patch: > > Signed-off-by: Fabrice Aeschbacher <[EMAIL PROTECTED]> > > === > --- linux-2.6.20.7-orig/drivers/ide/legacy/ide-cs.c 2007-04-15 > 21:08:02.0 +0200 > +++ linux-2.6.20.7/drivers/ide/legacy/ide-cs.c2007-04-26 > 11:13:13.0 +0200 > @@ -399,6 +399,7 @@ > PCMCIA_DEVICE_PROD_ID12("TOSHIBA", "MK2001MPL", 0xb4585a1a, > 0x3489e003), > PCMCIA_DEVICE_PROD_ID1("TRANSCEND512M ", 0xd0909443), > PCMCIA_DEVICE_PROD_ID12("TRANSCEND", "TS1GCF80", 0x709b1bf1, > 0x2a54d4b1), > + PCMCIA_DEVICE_PROD_ID12("TRANSCEND", "TS2GCF120", 0x709b1bf1, > 0x969aa4f2), > PCMCIA_DEVICE_PROD_ID12("TRANSCEND", "TS4GCF120", 0x709b1bf1, > 0xf54a91c8), > PCMCIA_DEVICE_PROD_ID12("WIT", "IDE16", 0x244e5994, 0x3e232852), > PCMCIA_DEVICE_PROD_ID12("WEIDA", "TWTTI", 0xcc7cf69c, > 0x212bb918), > === This one-liner is turning into a fiasco. All the top-posting and word-wrapped patches aren't helping :( I presently have two patches. Please check them. ide-cs-recognize-2gb-compactflash-from-transcend.patch: From: "Aeschbacher, Fabrice" <[EMAIL PROTECTED]> Without the following patch, the kernel does not automatically detect 2GB CompactFlash cards from Transcend. Signed-off-by: Fabrice Aeschbacher <[EMAIL PROTECTED]> Cc: Dominik Brodowski <[EMAIL PROTECTED]> Cc: Bartlomiej Zolnierkiewicz <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- drivers/ide/legacy/ide-cs.c |2 ++ 1 files changed, 2 insertions(+) diff -puN drivers/ide/legacy/ide-cs.c~ide-cs-recognize-2gb-compactflash-from-transcend drivers/ide/legacy/ide-cs.c --- a/drivers/ide/legacy/ide-cs.c~ide-cs-recognize-2gb-compactflash-from-transcend +++ a/drivers/ide/legacy/ide-cs.c @@ -401,6 +401,8 @@ static struct pcmcia_device_id ide_ids[] PCMCIA_DEVICE_PROD_ID12("TOSHIBA", "MK2001MPL", 0xb4585a1a, 0x3489e003), PCMCIA_DEVICE_PROD_ID1("TRANSCEND512M ", 0xd0909443), PCMCIA_DEVICE_PROD_ID12("TRANSCEND", "TS1GCF80", 0x709b1bf1, 0x2a54d4b1), + PCMCIA_DEVICE_PROD_ID12("TRANSCEND", "TS2GCF120", 0x709b1bf1, 0xf54a91c8), + PCMCIA_DEVICE_PROD_ID12("TRANSCEND", "TS2GCF120", 0x709b1bf1, 0x969aa4f2), PCMCIA_DEVICE_PROD_ID12("TRANSCEND", "TS4GCF120", 0x709b1bf1, 0xf54a91c8), PCMCIA_DEVICE_PROD_ID12("WIT", "IDE16", 0x244e5994, 0x3e232852), PCMCIA_DEVICE_PROD_ID12("WEIDA", "TWTTI", 0xcc7cf69c, 0x212bb918), _ Is this really supposed to add a TS2GCF120 entry with the same IDs as TS4GCF120? And pata_pcmcia-recognize-2gb-compactflash-from-transcend.patch: From: "Aeschbacher, Fabrice" <[EMAIL PROTECTED]> Allow the pata_pcmcia driver to automatically detect 2GB CompactFlash cards from Transcend. Signed-off-by: Fabrice Aeschbacher <[EMAIL PROTECTED]> Cc: "Peter Stuge" <[EMAIL PROTECTED]> Acked-by: Alan Cox <[EMAIL PROTECTED]> Cc: Bartlomiej Zolnierkiewicz <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- drivers/ata/pata_pcmcia.c |1 + 1 files changed, 1 insertion(+) diff -puN drivers/ata/pata_pcmcia.c~pata_pcmcia-recognize-2gb-compactflash-from-transcend drivers/ata/pata_pcmcia.c --- a/drivers/ata/pata_pcmcia.c~pata_pcmcia-recognize-2gb-compactflash-from-transcend +++ a/drivers/ata/pata_pcmcia.c @@ -396,6 +396,7 @@ static struct pcmcia_device_id pcmcia_de PCMCIA_DEVICE_PROD_ID12("TOSHIBA", "MK2001MPL", 0xb4585a1a, 0x3489e003), PCMCIA_DEVICE_PROD_ID1("TRANSCEND512M ", 0xd0909443), PCMCIA_DEVICE_PROD_ID12("TRANSCEND", "TS1GCF80", 0x709b1bf1, 0x2a54d4b1), + PCMCIA_DEVICE_PROD_ID12("TRANSCEND", "TS2GCF120", 0x709b1bf1, 0x969aa4f2), PCMCIA_DEVICE_PROD_ID12("TRANSCEND", "TS4GCF120", 0x709b1bf1, 0xf54a91c8), PCMCIA_DEVICE_PROD_ID12("WIT", "IDE16", 0x244e5994, 0x3e232852), PCMCIA_DEVICE_PROD_ID12("WEIDA", "TWTTI", 0xcc7cf69c, 0x212bb918), _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
Rohit Seth wrote: On Fri, 2007-04-27 at 21:55 +1000, Nick Piggin wrote: That's the theory. However, I'd still like to know how the arch code can make the assertion that icache is known to be at all times other than at the time of a fault? Kernel needs to only worry about the updates that it does. So, if kernel is writing into a page that is getting marked with execute permission then it will need to make sure that caches are coherent. ia64 Kernel keeps track of whether it has done any write operation on a page or not using PG_arch_1. And accordingly flushes icaches. It flushes icache at fault time, I know. What I don't know is why we leave them to drift out of sync afterwards. Ie. what if an operation which causes incoherency is carried out _after_ an executable mapping is installed for that page. You mean by user space? If so, then it is user space responsibility to do the appropriate operations (like flush icache in this case). No, I mean places that set PG_arch_1. flush_dcache_page. This can happen for mapped pages in write, splice, install_arg_page looks questionable, direct IO... Actually there are various windows where mapped pages can be !uptodate, so there is technically most of the filesystem code as well, but I'm trying to stamp those out, so let's ignore that for now. What if you were to say remove all the PG_arch_1 code, and do something really simple like flush icache in flush_dcache_page? Would performance suffer horribly? -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sky2 regression in 2.6.21: Asus P5B-E Plus ethernet adapter no more supported
Stephen Hemminger wrote: But the same hardware dies horribly on Gigabyte GA-965P motherboards. Could you send me full lspci -vvx output. I'll re-enable it for Asus and add a block for the Gigabyte boards. (sigh) To add to the mix, Robert Tate on the same Gentoo bug reports that the Yukon2 hardware on the Gigabyte DQ6 works fine with sky2: https://bugs.gentoo.org/show_bug.cgi?id=176219 03:00.0 0200: 11ab:4364 (rev 12) Subsystem: 1458:e000 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag- Device: Latency L0s unlimited, L1 unlimited Device: AtnBtn- AtnInd- PwrInd- Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported- Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- Device: MaxPayload 128 bytes, MaxReadReq 512 bytes Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s L1, Port 0 Link: Latency L0s <256ns, L1 unlimited Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch- Link: Speed 2.5Gb/s, Width x1 Capabilities: [100] Advanced Error Reporting 00: ab 11 64 43 07 04 10 00 12 00 00 02 08 00 00 00 10: 04 00 00 f7 00 00 00 00 01 70 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 58 14 00 e0 30: 00 00 00 00 48 00 00 00 00 00 00 00 0a 01 00 00 Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.21 - BUG: at arch/i386/kernel/smp.c:177 send_IPI_mask_bitmask()
Got this error just before suspend to disk. Suspend/resume without problem, but only saw this after upgrading to 2.6.21 (no problem with 2.6.21-rc7, I think). CONFIG_NO_HZ, CONFIG_HIGH_RES_TIMERS unset CONFIG_HPET_TIMER=y ACPI: PCI interrupt for device :00:1b.0 disabled Disabling non-boot CPUs ... swsusp: critical section: swsusp: Need to copy 46874 pages BUG: at arch/i386/kernel/smp.c:177 send_IPI_mask_bitmask() [] send_IPI_mask_bitmask+0x52/0xa4 [] tick_do_broadcast_on_off+0x0/0xd3 [] __smp_call_function_single+0x44/0x64 [] tick_do_broadcast_on_off+0x0/0xd3 [] smp_call_function_single+0xc2/0xeb [] tick_broadcast_on_off+0x48/0x63 [] tick_notify+0x20/0x54 [] notifier_call_chain+0x1b/0x2d [] clockevents_notify+0x19/0x54 [] acpi_processor_power_verify+0x7d/0x86 [] acpi_processor_get_power_info+0x35/0x6c [] acpi_processor_cst_has_changed+0x37/0x55 [] acpi_processor_notify+0x4c/0x5f [] acpi_ev_notify_dispatch+0x52/0x5b [] acpi_ev_queue_notify_request+0x9e/0xb0 [] acpi_ex_opcode_2A_0T_0R+0x68/0x96 [] acpi_ds_exec_end_op+0xc1/0x386 [] acpi_os_release_object+0x5/0x8 [] acpi_ps_complete_op+0x1cc/0x1db [] acpi_ps_parse_loop+0x271/0x2a7 [] acpi_os_release_object+0x5/0x8 [] acpi_ps_parse_aml+0x69/0x219 [] acpi_ds_init_aml_walk+0xb3/0x106 [] acpi_ps_execute_method+0xaf/0xe5 [] acpi_ns_evaluate+0x9b/0xf4 [] acpi_evaluate_object+0x14c/0x1f3 [] acpi_leave_sleep_state+0x190/0x26d [] acpi_pm_finish+0x11/0x40 [] pm_suspend_disk+0x170/0x185 [] enter_state+0x44/0x76 [] state_store+0x87/0x9e [] state_store+0x0/0x9e [] subsys_attr_store+0x1c/0x24 [] flush_write_buffer+0x23/0x28 [] sysfs_write_file+0x45/0x67 [] vfs_write+0x8b/0x106 [] sys_write+0x41/0x67 [] syscall_call+0x7/0xb [] irttp_open_tsap+0x149/0x1d1 === Thanks, Jeff. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/17] Large Blocksize Support V3
Andrew Morton wrote: On Sat, 28 Apr 2007 03:34:32 +1000 David Chinner <[EMAIL PROTECTED]> wrote: Some more information - stripe unit on the dm raid0 is 512k. I have not attempted to increase I/O sizes at all yet - these test are just demonstrating efficiency improvements in the filesystem. These numbers for 32GB files. READWRITE disks blksz tput sys tputsys - -- - 1 4k89 18s 57 44s 116k46 13s 67 18s 164k75 12s 68 12s 2 4k 179 20s 114 43s 216k55 13s 132 18s 264k 126 12s 126 12s 4 4k 350 20s 214 43s 416k 350 14s 264 19s 464k 176 11s 266 12s 8 4k 415 21s 446 41s 816k 655 13s 518 19s 864k 664 12s 552 12s 12 4k 413 20s 633 33s 1216k 736 14s 741 19s 1264k 836 12s 743 12s Throughput in MB/s. Consistent improvement across the write results, first time I've hit the limits of the PCI-X bus with a single buffered I/O thread doing either reads or writes. 1-disk and 2-disk read throughput fell by an improbable amount, which makes me cautious about the other numbers. Your annotation says "blocksize". Are you really varying the fs blocksize here, or did you mean "pagesize"? What worries me here is that we have inefficient code, and increasing the pagesize amortises that inefficiency without curing it. If so, it would be better to fix the inefficiencies, so that 4k pagesize will also benefit. For example, see __do_page_cache_readahead(). It does a read_lock() and a page allocation and a radix-tree lookup for each page. We can vastly improve that. Step 1: - do a read-lock - do a radix-tree walk to work out how many pages are missing - read-unlock - allocate that many pages - read_lock() - populate all the pages. - read_unlock - if any pages are left over, free them - if we ended up not having enough pages, redo the whole thing. that will reduce the number of read_lock()s, read_unlock()s and radix-tree descents by a factor of 32 or so in this testcase. That's a lot, and it's something we (Nick ;)) should have done ages ago. We can do pretty well with the lockless radix tree (that is already upstream) there. I split that stuff out of my most recent lockless pagecache patchset, because it doesn't require the "scary" speculative refcount stuff of the lockless pagecache proper. Subject: [patch 5/9] mm: lockless probe. So that is something we could merge pretty soon. The other thing is that we can batch up pagecache page insertions for bulk writes as well (that is. write(2) with buffer size > page size). I should have a patch somewhere for that as well if anyone interested. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm1: BUG_ON in kthread_bind during _cpu_down
On Thu, 26 Apr 2007 18:28:38 +0530 Gautham R Shenoy <[EMAIL PROTECTED]> wrote: > I just checked with Vatsa if there was any subtle reason why they > had put in the kthread_bind() in cpu.c. Vatsa cannot seem to recollect > any and I can't see any. So let us just remove the kthread_bind. > > Signed-off-by: Gautham R Shenoy <[EMAIL PROTECTED]> > --- > kernel/cpu.c |4 > 1 files changed, 4 deletions(-) > > Index: linux-2.6.21-rc7/kernel/cpu.c > === > --- linux-2.6.21-rc7.orig/kernel/cpu.c > +++ linux-2.6.21-rc7/kernel/cpu.c > @@ -176,10 +176,6 @@ static int _cpu_down(unsigned int cpu, i > /* This actually kills the CPU. */ > __cpu_die(cpu); > > - /* Move it here so it can run. */ > - kthread_bind(p, get_cpu()); > - put_cpu(); > - > /* CPU is completely dead: tell everyone. Too late to complain. */ > if (raw_notifier_call_chain(_chain, CPU_DEAD | mod, > hcpu) == NOTIFY_BAD) So I cooked up a changelog and queued up the diff. But I have an uneasy feeling that things are getting a bit close to guesswork here. We have a huge amount of change pending in the kthread/workqueue/freezer area, partly because I decided not to merge most of the workqueue changes into 2.6.21. It'd be good if people could take some time to sit down and re-review the code which we presently have. I plan on sending it all off for 2.6.22 and there might be some glitches but it seems to have a good track record so far. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Question about Reiser4 (how to boot it?)
On 4/28/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: Thanks, that is certainly helpful, but that only mounts one directory (partition) as Reiser4. This I have already done. I was more interested in how to have a whole partition dedicated to Reiser4 and being able to boot into it. Not able to boot a whole partition with grub2. I've seen patch for grub ... ftp://ftp.namesys.com/pub/reiser4progs/grub-0.97-reiser4-20050808.tar.gz But I since I'm using grub2, it's not possible to boot directly into reiser4. I'm only use the whole 250GB partition on my 2nd hard disk for testing. I'm as interested as you in looking for grub2 to boot directly. Currently, I've to create a small ext2 partition for grub2. Jeff. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Saturday, 28 April 2007 03:12, Linus Torvalds wrote: > > On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > > > > > It's doubly bad, because that idiocy has also infected s2ram. Again, > > > another thing that really makes no sense at all - and we do it not just > > > for snapshotting, but for s2ram too. Can you tell me *why*? > > > > Why we freeze tasks at all or why we freeze kernel threads? > > In many ways, "at all". > > I _do_ realize the IO request queue issues, and that we cannot actually do > s2ram with some devices in the middle of a DMA. So we want to be able to > avoid *that*, there's no question about that. And I suspect that stopping > user threads and then waiting for a sync is practically one of the easier > ways to do so. > > So in practice, the "at all" may become a "why freeze kernel threads?" and > freezing user threads I don't find really objectionable. > > But as Paul pointed out, Linux on the old powerpc Mac hardware was > actually rather famous for having working (and reliable) suspend long > before it worked even remotely reliably on PC's. And they didn't do even > that. > > (They didn't have ACPI, and they had a much more limited set of devices, > but the whole process freezer is really about neither of those issues. The > wild and wacky PC hardware has its problems, but that's _one_ thing we > can't blame PC hardware for ;) We freeze user space processes for the reasons that you have quoted above. Why we freeze kernel threads in there too is a good question, but not for me to answer. I don't know. Pavel should know, I think. > > > git grep create_freezeable_workthread > > > > s/workthread/workqueue/ > > Yes. > > > > and ponder the end results of that grep. If you don't see something > > > wrong, > > > you're blind. > > > > This was a mistake, quite unrelated to the point you're making. > > Did you actually _do_ the "grep" (with the fixed argument)? > > I had two totally independent points. #1 was that you yourself have been > fixing bugs in this area. #2 was the result of that grep. It's absolutely > _empty_ except for the define to add that interface. > > NOBODY USES IT! The reason is pretty simple. We wanted to drop that interface altogether, because it was broken (my fault), but Oleg suggested that we keep it so that we could fix and use it in the future (for purposes other than the hibernation, though). > Now, grep for the same interface that creates _non_freezeable workqueues. > > Put another way: > > [EMAIL PROTECTED] linux]$ git grep create_workqueue | wc -l > 35 > > [EMAIL PROTECTED] linux]$ git grep create_freezeable_workqueue | wc -l > 1 > > and that _one_ hit you get for the "freezeable" case is not actually a > user, it's the definition! > > Ie my point is, nobody wants freezeable kernel threads. Absolutely nobody. That's freezable workqueues only. :-) > Yet we have all this support for freezing them (or rather, we freeze them > by default, and then we have all this support for _not_ doing that wrong > default thing!) > > So yes, I think it would be interesting to just stop freezing kernel > threads. Totally. Okay, I'll do that. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm] Allow selective freezing of the system for different events
This patch * Provides an interface to selectively freeze the system for different events. * Allows tasks to exempt themselves or other tasks from specific freeze events. * Allow nesting of freezer calls. For eg: freeze_processes(EVENT_A); /* Do something with respect to event A */ . . . freeze_processes(EVENT_B); /* Do something with respect to event B */ . . . thaw_processes(EVENT_B); . . . thaw_processes(EVENT_B); This type of behaviour would be required when cpu hotplug would start using the process freezer, where EVENT_A would be SUSPEND and EVENT_B would be HOTPLUG_CPU. This patch applies on the top of 2.6.21-rc7-mm2 + Rafael's freezer changes from http://lkml.org/lkml/2007/4/27/302. Signed-off-by: Gautham R Shenoy <[EMAIL PROTECTED]> --- arch/i386/kernel/apm.c |2 - drivers/block/loop.c|2 - drivers/char/apm-emulation.c|6 +-- drivers/ieee1394/ieee1394_core.c|2 - drivers/md/md.c |2 - drivers/mmc/card/queue.c|2 - drivers/mtd/mtd_blkdevs.c |2 - drivers/scsi/libsas/sas_scsi_host.c |2 - drivers/scsi/scsi_error.c |2 - drivers/usb/storage/usb.c |2 - include/linux/freezer.h | 44 +++- kernel/freezer.c| 64 ++-- kernel/kprobes.c|4 +- kernel/kthread.c|2 - kernel/power/disk.c |4 +- kernel/power/main.c |8 ++-- kernel/power/user.c |6 +-- kernel/rcutorture.c |4 +- kernel/sched.c |2 - kernel/softirq.c|2 - kernel/softlockup.c |2 - kernel/workqueue.c |2 - 22 files changed, 119 insertions(+), 49 deletions(-) Index: linux-2.6.21-rc7/include/linux/freezer.h === --- linux-2.6.21-rc7.orig/include/linux/freezer.h +++ linux-2.6.21-rc7/include/linux/freezer.h @@ -4,17 +4,27 @@ #ifdef CONFIG_FREEZER + /* * Per task flags used by the freezer * * They should not be referred to directly outside of this file. */ -#define TFF_NOFREEZE 0 /* task should not be frozen */ +#define TFF_FE_SUSPEND 0 /* Do not freeze task for software suspend */ +#define TFF_FE_KPROBES 1 /* Do not freeze task for kprobes */ #define TFF_FREEZE 8 /* task should go to the refrigerator ASAP */ #define TFF_SKIP 9 /* do not count this task as freezable */ #define TFF_FROZEN 10 /* task is frozen */ /* + * Codes of different events which use the freezer + * These are the only flags that can be referred outside this file + */ +#define FE_SUSPEND (1 << TFF_FE_SUSPEND) /* Software Suspend */ +#define FE_KPROBES (1 << TFF_FE_KPROBES) /* Kprobes */ +#define FE_ALL (FE_SUSPEND | FE_KPROBES) /* All events using freezer */ + +/* * Check if a process has been frozen */ static inline int frozen(struct task_struct *p) @@ -57,19 +67,29 @@ static inline void clear_freeze_flag(str } /* - * Check if the task wants to be exempted from freezing + * Check if the task wants to be exempted from freezing for + * freeze_event. */ -static inline int freezer_should_exempt(struct task_struct *p) +static inline int freezer_should_exempt(struct task_struct *p, + unsigned long freeze_event) { - return test_bit(TFF_NOFREEZE, >freezer_flags); + return p->freezer_flags & freeze_event; } /* * Tell the freezer to exempt this task from freezing + * for events in freeze_event_mask. */ -static inline void freezer_exempt(struct task_struct *p) +static inline void freezer_exempt(struct task_struct *p, + unsigned long freeze_event_mask) +{ + atomic_set_mask(freeze_event_mask, >freezer_flags); +} + +/* Returns the mask of the events for which this process is freezeable */ +static inline unsigned long freezeable_event_mask(struct task_struct *p) { - set_bit(TFF_NOFREEZE, >freezer_flags); + return ~p->freezer_flags & FE_ALL; } /* @@ -96,8 +116,8 @@ static inline int thaw_process(struct ta } extern void refrigerator(void); -extern int freeze_processes(void); -extern void thaw_processes(void); +extern int freeze_processes(unsigned long freeze_event); +extern void thaw_processes(unsigned long freeze_event); static inline int try_to_freeze(void) { @@ -160,11 +180,15 @@ static inline int freezing(struct task_s static inline void freeze(struct task_struct *p) { BUG(); } static inline int freezer_should_exempt(struct task_struct *p) { return 0; } static inline void freezer_exempt(struct task_struct *p) {} +static inline unsigned long
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
On Fri, 2007-04-27 at 15:18 +0100, Hugh Dickins wrote: > I presume Mike and Anil are correct, that it needs to be done before > putting pte into page table, not left until after: but as you've > guessed, that needs to be done everywhere, not just in the two > places so far identified. > That sounds about right. Before installing new mapping, kernel should ensure there are no stale contents in caches or TLB. lazy_mmu_prot_update needs to be called whenever the permissions on pte (about to) change. So if remapping is causing change in protection then lazy_mmu_prot_update needs to be called. > When it was discussed last year (in connection with Peter's page > cleaning patches) it was thought to be a variant of update_mmu_cache() > (after setting pte), and we added the fremap one to accompany it; > but now it looks to be a variant of flush_icache_page() (before > setting pte). > > I believe lazy_mmu_prot_update(pteval) came into existence primarily > for mprotect's change_pte_range() case. Yup. > If ia64 filled in its > flush_icache_page(vma, page), that could have been used there > (checking 'vm_flags & VM_EXEC' instead of pte_exec): but that would > involve a relatively expensive(?) pte_page() in a place which doesn't > need to know the struct page for other cases. > > Well, not pte_page(), it needs to be vm_normal_page() doesn't it? > and ia64's current lazy_mmu_prot_update is unsafe when !pfn_valid. > > Some flush_icache_pages are already in place, others are not: do > we need to add some? But those architectures which have a non-empty > flush_icache_page seem to have survived without the additional calls > - so they might be unnecessarily slowed down by additional calls. > Right. Extra flush_icache_page routines will add cost to archs that have non-null definition of this routine. BTW, isn't flush_icache_page marked for deprecation? > I believe that was the secondary reason for lazy_mmu_prot_update(), > perhaps better called ia64_flush_icache_page(): to allow calls to > be added where ia64 was (mistakenly) thought to want them, without > needing a protracted audit of how other architectures might be > impacted. > lazy_mmu_prot_update was added specifically for notifying change in protection. So, in a way it is closer to update_mmu_cache (Which is for change in mappings itself). Though for ia64 implementation, this ends up flushing the icaches when needed. Hopefully my reply is useful. -rohit - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Fri, 27 Apr 2007, Linus Torvalds wrote: On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: It's doubly bad, because that idiocy has also infected s2ram. Again, another thing that really makes no sense at all - and we do it not just for snapshotting, but for s2ram too. Can you tell me *why*? Why we freeze tasks at all or why we freeze kernel threads? In many ways, "at all". I _do_ realize the IO request queue issues, and that we cannot actually do s2ram with some devices in the middle of a DMA. So we want to be able to avoid *that*, there's no question about that. And I suspect that stopping user threads and then waiting for a sync is practically one of the easier ways to do so. So in practice, the "at all" may become a "why freeze kernel threads?" and freezing user threads I don't find really objectionable. there was a thread last week (or so) about splitting up the process list, one list for normal user processes, one for kernel threads, and one for dead processes waiting to be reaped. it almost sounds like what you want to do is to act as if the normal user threads weren't there for a short time (while you make the snapshot) and then recover them to continue and save the snapshot. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: On Saturday, 28 April 2007 03:03, Kyle Moffett wrote: On Apr 27, 2007, at 18:07:46, Nigel Cunningham wrote: Hi. On Fri, 2007-04-27 at 14:44 -0700, Linus Torvalds wrote: It makes it harder to debug (wouldn't it be *nice* to just ssh in, and do gdb -p Make the machine being suspended a VM and you can already do that. when something goes wrong?) but we also *depend* on user space for various things (the same way we depend on kernel threads, and why it has been such a total disaster to try to freeze the kernel threads too!). For example, if you want to do graphical stuff, just using X would be quite nice, wouldn't it? But in doing so you make the contents of the disk inconsistent with the state you've just snapshotted, leading to filesystem corruption. Even if you modify filesystems to do checkpointing (which is what we're really talking about), you still also have the problem that your snapshot has to be stored somewhere before you write it to disk, so you also have to either [snip] Actually, it's a lot simpler than that. We can just combine the device-mapper snapshot with a VM+kernel snapshot system call and be almost done: sys_snapshot(dev_t snapblockdev, int __user *snapshotfd); When sys_snapshot is run, the kernel does: 1) Sequentially freeze mounted filesystems using blockdev freezing. If it's an fs that doesn't support freezing then either fail or force- remount-ro that fs and downgrade all its filedescriptors to RO. Doesn't need extra locking since process which try to do IO either succeed before the freeze call returns for that blockdev or sleep on the unfreeze of that blockdev. Filesystems are synchronized and made clean. 2) Iterate over the userspace process list, freezing each process and remapping all of its pages copy-on-write. Any device-specific pages need to have state saved by that device. Why do you want to do 2) after 1) and not vice versa? it doesn't really need to matter. if you care, just arrange to not schedule user processes while you are doing both steps. 3) All processes (except kernel threads) are now frozen. 4) Kernel should save internal state corresponding to current userspace state. The kernel also swaps out excess pages to free up enough RAM and prepares the snapshot file-descriptor with copies of kernel memory and the original (pre-COW) mapped userspace pages. 5) Kernel substitutes filesystems for either a device-mapper snapshot with snapblockdev as backing storage or union with tmpfs and remounts the underlying filesystems as read-only. 6) Kernel unfreezes all userspace processes and returns the snapshot FD to userspace (where it can be read from). Okay, but how do we do the error recovery if, for example, the image cannot be saved? give the user an error message telling him this, wait for confirmation, and then jump directly to the restore step. revert everything to the snapshot image(s), restart it. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
On Fri, 2007-04-27 at 21:55 +1000, Nick Piggin wrote: > That's the theory. However, I'd still like to know how the arch code can > make the assertion that icache is known to be at all times other than at > the time of a fault? > Kernel needs to only worry about the updates that it does. So, if kernel is writing into a page that is getting marked with execute permission then it will need to make sure that caches are coherent. ia64 Kernel keeps track of whether it has done any write operation on a page or not using PG_arch_1. And accordingly flushes icaches. > Ie. what if an operation which causes incoherency is carried out _after_ > an executable mapping is installed for that page. > You mean by user space? If so, then it is user space responsibility to do the appropriate operations (like flush icache in this case). -rohit - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Apr 27, 2007, at 21:15:28, Rafael J. Wysocki wrote: On Saturday, 28 April 2007 03:03, Kyle Moffett wrote: On Apr 27, 2007, at 18:07:46, Nigel Cunningham wrote: But in doing so you make the contents of the disk inconsistent with the state you've just snapshotted, leading to filesystem corruption. Even if you modify filesystems to do checkpointing (which is what we're really talking about), you still also have the problem that your snapshot has to be stored somewhere before you write it to disk, so you also have to either [snip] When sys_snapshot is run, the kernel does: 1) Sequentially freeze mounted filesystems using blockdev freezing. If it's an fs that doesn't support freezing then either fail or force-remount-ro that fs and downgrade all its filedescriptors to RO. Doesn't need extra locking since process which try to do IO either succeed before the freeze call returns for that blockdev or sleep on the unfreeze of that blockdev. Filesystems are synchronized and made clean. 2) Iterate over the userspace process list, freezing each process and remapping all of its pages copy-on-write. Any device-specific pages need to have state saved by that device. Why do you want to do 2) after 1) and not vice versa? (1) can be done without extra locking. Device-mapper already has code to freeze filesystems and that makes a natural process-stopping point. Any threads doing IO will very quickly put themselves to sleep at (1) and save us some effort during step 2. 6) Kernel unfreezes all userspace processes and returns the snapshot FD to userspace (where it can be read from). Okay, but how do we do the error recovery if, for example, the image cannot be saved? If the image can't be saved then there are 2 options: (1) Call sys_restore() with the image (2) Pass your snapshot file-descriptor to sys_unsnapshot() In the former case, the system will be restored to the state it was at a few seconds earlier, right as it took the snapshot. In the latter case the modified-in-memory snapshot pages will be synced back to the disk filesystems, the copy-on-write data-structures torn down (think of merging an LVM snapshot back into its base device), and the memory allocated for the snapshot will be freed. Either way the system is properly in sync with disk again, the only difference is whether you want to preserve the userspace state from during the attempted snapshot (IE: any error status). You could also save the error state in case (1) by just auto-posting a bug-report on http:// bugs.$VENDOR.com/ of course :-D. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
NR_UNSTABLE_FS vs. NR_FILE_DIRTY: double counting pages?
There are several places where we add together NR_UNSTABLE_FS and NF_FILE_DIRTY: sync_inodes_sb() balance_dirty_pages() wakeup_pdflush() wb_kupdate() prefetch_suitable() I can trace a standard codepath where it seems both of these are set on the same page: nfs_file_aops.commit_write -> nfs_commit_write nfs_updatepages nfs_writepage_setup nfs_wb_page nfs_wb_page_priority nfs_writepage_locked nfs_flush_mapping nfs_flush_list nfs_flush_multi nfs_write_partial_ops.rpc_call_done nfs_writeback_done_partial nfs_writepage_release nfs_reschedule_unstable_write nfs_mark_request_commit incr NR_UNSTABLE_NFS nfs_file_aops.commit_write -> nfs_commit_write nfs_updatepage __set_page_dirty_nobuffers incr NF_FILE_DIRTY This is the standard code path that derives from sys_write(). Can someone either show how this code sequence can't happen, or confirm for me that there's a bug? -- Ethan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Nigel Cunningham nigel.suspend2.net> writes: > 4) uswsusp and swsusp get dropped and Suspend2 goes into mainline. After reading most of this thread, it seems that Linus is of the view that all three of these suck in one way or another. Suspend2 has the most features and is the fastest of the lot. It can behave like swsusp from the user's point of view (i.e. echo disk > /sys/power/state), so the migration should be seamless for most distros. It isn't complicated to set up. It's been proven in the field. It looks pretty. So, while we're waiting for the next STD technology, why not have the best and develop from there? -- Bojan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > > > It's doubly bad, because that idiocy has also infected s2ram. Again, > > another thing that really makes no sense at all - and we do it not just > > for snapshotting, but for s2ram too. Can you tell me *why*? > > Why we freeze tasks at all or why we freeze kernel threads? In many ways, "at all". I _do_ realize the IO request queue issues, and that we cannot actually do s2ram with some devices in the middle of a DMA. So we want to be able to avoid *that*, there's no question about that. And I suspect that stopping user threads and then waiting for a sync is practically one of the easier ways to do so. So in practice, the "at all" may become a "why freeze kernel threads?" and freezing user threads I don't find really objectionable. But as Paul pointed out, Linux on the old powerpc Mac hardware was actually rather famous for having working (and reliable) suspend long before it worked even remotely reliably on PC's. And they didn't do even that. (They didn't have ACPI, and they had a much more limited set of devices, but the whole process freezer is really about neither of those issues. The wild and wacky PC hardware has its problems, but that's _one_ thing we can't blame PC hardware for ;) > > git grep create_freezeable_workthread > > s/workthread/workqueue/ Yes. > > and ponder the end results of that grep. If you don't see something wrong, > > you're blind. > > This was a mistake, quite unrelated to the point you're making. Did you actually _do_ the "grep" (with the fixed argument)? I had two totally independent points. #1 was that you yourself have been fixing bugs in this area. #2 was the result of that grep. It's absolutely _empty_ except for the define to add that interface. NOBODY USES IT! Now, grep for the same interface that creates _non_freezeable workqueues. Put another way: [EMAIL PROTECTED] linux]$ git grep create_workqueue | wc -l 35 [EMAIL PROTECTED] linux]$ git grep create_freezeable_workqueue | wc -l 1 and that _one_ hit you get for the "freezeable" case is not actually a user, it's the definition! Ie my point is, nobody wants freezeable kernel threads. Absolutely nobody. Yet we have all this support for freezing them (or rather, we freeze them by default, and then we have all this support for _not_ doing that wrong default thing!) So yes, I think it would be interesting to just stop freezing kernel threads. Totally. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Saturday, 28 April 2007 03:03, Kyle Moffett wrote: > On Apr 27, 2007, at 18:07:46, Nigel Cunningham wrote: > > Hi. > > > > On Fri, 2007-04-27 at 14:44 -0700, Linus Torvalds wrote: > >> It makes it harder to debug (wouldn't it be *nice* to just ssh in, > >> and do > >>gdb -p > > > > Make the machine being suspended a VM and you can already do that. > > >> when something goes wrong?) but we also *depend* on user space for > >> various things (the same way we depend on kernel threads, and why > >> it has been such a total disaster to try to freeze the kernel > >> threads too!). For example, if you want to do graphical stuff, > >> just using X would be quite nice, wouldn't it? > > > > But in doing so you make the contents of the disk inconsistent with > > the state you've just snapshotted, leading to filesystem > > corruption. Even if you modify filesystems to do checkpointing > > (which is what we're really talking about), you still also have the > > problem that your snapshot has to be stored somewhere before you > > write it to disk, so you also have to either [snip] > > Actually, it's a lot simpler than that. We can just combine the > device-mapper snapshot with a VM+kernel snapshot system call and be > almost done: > >sys_snapshot(dev_t snapblockdev, int __user *snapshotfd); > > When sys_snapshot is run, the kernel does: > > 1) Sequentially freeze mounted filesystems using blockdev freezing. > If it's an fs that doesn't support freezing then either fail or force- > remount-ro that fs and downgrade all its filedescriptors to RO. > Doesn't need extra locking since process which try to do IO either > succeed before the freeze call returns for that blockdev or sleep on > the unfreeze of that blockdev. Filesystems are synchronized and made > clean. > 2) Iterate over the userspace process list, freezing each process > and remapping all of its pages copy-on-write. Any device-specific > pages need to have state saved by that device. Why do you want to do 2) after 1) and not vice versa? > 3) All processes (except kernel threads) are now frozen. > 4) Kernel should save internal state corresponding to current > userspace state. The kernel also swaps out excess pages to free up > enough RAM and prepares the snapshot file-descriptor with copies of > kernel memory and the original (pre-COW) mapped userspace pages. > 5) Kernel substitutes filesystems for either a device-mapper > snapshot with snapblockdev as backing storage or union with tmpfs and > remounts the underlying filesystems as read-only. > 6) Kernel unfreezes all userspace processes and returns the snapshot > FD to userspace (where it can be read from). Okay, but how do we do the error recovery if, for example, the image cannot be saved? > Then userspace can do whatever it wants. Any changes to filesystems > mounted at the time of snapshot will be discarded at shutdown. > Freshly mounted filesystems won't have the union or COW thing done, > and so you can write your snapshot to a compressed encrypted file on > a USB key if you want to, you just have to unmount it before the > snapshot() syscall and remount it right afterwards. This seems to be a good idea. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: "REPORT: sd-0.46 vs cfs-v6 vs mainline 2.6.21-rc7 Beryl + Video + Audio"
On 27/04/07, hechacker1 <[EMAIL PROTECTED]> wrote: "REPORT: sd-0.46 vs cfs-v6 vs mainline 2.6.21-rc7 Beryl + Video + Audio" Hardware: Dell Inspiron 700m laptop 1.7GHz Pentium M (Dothan 2M cache) 2GB RAM 1000Hz Gentoo Linux dyn-tick 700m # cat /sys/devices/system/cpu/cpu0/cpufreq/ondemand/sampling_rate 1 (microseconds, 10ms) 855gm integrated video/chipset xf86-video-i810 (intel 1.7.4) DRI enabled xorg-server-1.2.0-r3 beryl-core 0.3.0-svn MPlayer dev-SVN-rUNKNOWN-4.1.2 - x11 Gnome totem 2.16.5 - x11-gstreamer reiser4 w/cryptcompress Screenshot: http://ordorica.org/misc/beryl.png muine playing mp3's off mounted windows share Tests run under 16 bit color which provides a constant 75 fps on one cube side (fps forced limited). Drops to ~45-50 fps during animation/rotate/scale (depending on complexity of rendering) Vsync off. 75Hz refresh 1280x800. totem running fullscreen playing 700MB divx "An Inconvenient Truth.avi" on one side of cube/desktop gmplayer running fullscreen on another cube side (same file). The given observations/numbers are when I move the cube with my mouse and view two faces at one time (see screenshot). One face is playing the totem video, the other containing my terminals. Some numbers I've seen other people throw around: I don't know their relevance. cfs-v6: 700m kernel # cat sched_granularity_ns 500 procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 5 0 0 221480300 139461200 181 0 6068 5317 69 6 25 0 4 0 0 220880300 139526800 176 0 6147 5579 68 6 27 0 1 0 0 220340300 139576800 167 0 6052 5393 70 6 24 0 6 0 0 219920300 139620400 103 0 5830 5211 73 6 21 0 top - 18:31:17 up 7:45, 5 users, load average: 5.18, 4.73, 4.28 Tasks: 98 total, 4 running, 94 sleeping, 0 stopped, 0 zombie Cpu(s): 91.6%us, 6.4%sy, 0.0%ni, 0.3%id, 0.0%wa, 1.3%hi, 0.3%si, 0.0%st Mem: 2057700k total, 1845952k used, 211748k free, 300k buffers Swap: 987988k total,0k used, 987988k free, 1404040k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 18046 hechacke 20 0 189m 83m 20m S 38.7 4.2 12:04.64 totem 18059 hechacke 20 0 51280 30m 18m R 25.8 1.5 9:47.36 gmplayer 12117 root 20 0 275m 54m 18m R 20.2 2.7 15:18.38 Xorg 22730 hechacke 20 0 119m 35m 18m R 5.3 1.7 0:12.68 mono 12350 hechacke 20 0 63820 6776 4328 S 3.6 0.3 2:20.36 beryl 16465 hechacke 20 0 43960 15m 10m S 2.3 0.8 0:07.14 gnome-terminal 12200 hechacke 20 0 5308 4016 1740 S 0.3 0.2 0:05.45 gconfd-2 12215 hechacke 20 0 38704 8956 7588 S 0.3 0.4 0:08.90 xfce4-clipman-p Observation: Music plays perfectly. Audio of video's play perfectly. New processes take forever to start. Firefox (already cached in ram) takes about 5 seconds to start; even right after closing it. Browsing the web is slow. Already open applications are responsive. Behavior of video: video's both moving forward. totem is updating about every half second. mplayer updates about every 3 seconds. - cfs-v6: 700m kernel # cat sched_granularity_ns 200 procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 5 0 0 99604 44 151936400 0 0 3903 5575 91 5 5 0 3 0 0 99512 44 151936400 0 0 5990 6783 72 5 23 0 3 0 0 100412 44 151936400 0 0 6858 7261 67 5 28 0 1 0 0 100412 44 151936400 0 0 7426 7634 62 4 34 0 4 0 0 100288 44 151936400 0 0 7039 7442 60 6 34 0 top - 19:05:09 up 8:18, 5 users, load average: 3.62, 4.16, 4.28 Tasks: 98 total, 4 running, 94 sleeping, 0 stopped, 0 zombie Cpu(s): 69.8%us, 5.0%sy, 0.0%ni, 24.5%id, 0.0%wa, 0.7%hi, 0.0%si, 0.0%st Mem: 2057700k total, 2009396k used,48304k free, 300k buffers Swap: 987988k total,0k used, 987988k free, 1555428k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 18059 hechacke 20 0 51332 30m 18m R 30.8 1.5 18:48.17 gmplayer 18046 hechacke 20 0 189m 83m 20m S 20.9 4.2 23:25.49 totem 12117 root 20 0 276m 57m 18m S 9.6 2.8 20:59.01 Xorg 22730 hechacke 20 0 129m 36m 18m R 8.6 1.8 1:28.59 mono 22930 hechacke 20 0 65480 8392 4320 S 4.0 0.4 0:53.38 beryl 12213 hechacke 20 0 34472 7680 6484 S 0.7 0.4 1:16.41 xfce4-battery-p Observation: Music plays perfectly. Audio of video's play perfectly. New processes take forever to start. Browsing the web is slow. Already open applications are responsive. Behavior of video: video's both moving forward. totem is updating about
Re: Back to the future.
On Saturday, 28 April 2007 03:00, Matthew Garrett wrote: > On Fri, Apr 27, 2007 at 05:18:16PM -0700, Jeremy Fitzhardinge wrote: > > > Then you could use kexec for resume... > > While that would certainly be nifty, I think we're arguably starting > from the wrong point here. Why are we booting a kernel, trying to poke > the hardware back into some sort of mock-quiescent state, freeing memory > and then (finally) overwriting the entire contents of RAM rather than > just doing all of this from the bootloader? Given the time spent in > kernel setup and unpacking initramfs nowadays, I'm willing to bet it'd > still be faster even if you're stuck using int 13 on x86. Yes, that would be faster. > http://apcmag.com/5873/page14 suggests that Intel is looking into this, > but I haven't heard anything more yet. To the best of my knowledge, this > is also how Windows manages things. I think you're right. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Apr 27, 2007, at 18:07:46, Nigel Cunningham wrote: Hi. On Fri, 2007-04-27 at 14:44 -0700, Linus Torvalds wrote: It makes it harder to debug (wouldn't it be *nice* to just ssh in, and do gdb -p Make the machine being suspended a VM and you can already do that. when something goes wrong?) but we also *depend* on user space for various things (the same way we depend on kernel threads, and why it has been such a total disaster to try to freeze the kernel threads too!). For example, if you want to do graphical stuff, just using X would be quite nice, wouldn't it? But in doing so you make the contents of the disk inconsistent with the state you've just snapshotted, leading to filesystem corruption. Even if you modify filesystems to do checkpointing (which is what we're really talking about), you still also have the problem that your snapshot has to be stored somewhere before you write it to disk, so you also have to either [snip] Actually, it's a lot simpler than that. We can just combine the device-mapper snapshot with a VM+kernel snapshot system call and be almost done: sys_snapshot(dev_t snapblockdev, int __user *snapshotfd); When sys_snapshot is run, the kernel does: 1) Sequentially freeze mounted filesystems using blockdev freezing. If it's an fs that doesn't support freezing then either fail or force- remount-ro that fs and downgrade all its filedescriptors to RO. Doesn't need extra locking since process which try to do IO either succeed before the freeze call returns for that blockdev or sleep on the unfreeze of that blockdev. Filesystems are synchronized and made clean. 2) Iterate over the userspace process list, freezing each process and remapping all of its pages copy-on-write. Any device-specific pages need to have state saved by that device. 3) All processes (except kernel threads) are now frozen. 4) Kernel should save internal state corresponding to current userspace state. The kernel also swaps out excess pages to free up enough RAM and prepares the snapshot file-descriptor with copies of kernel memory and the original (pre-COW) mapped userspace pages. 5) Kernel substitutes filesystems for either a device-mapper snapshot with snapblockdev as backing storage or union with tmpfs and remounts the underlying filesystems as read-only. 6) Kernel unfreezes all userspace processes and returns the snapshot FD to userspace (where it can be read from). Then userspace can do whatever it wants. Any changes to filesystems mounted at the time of snapshot will be discarded at shutdown. Freshly mounted filesystems won't have the union or COW thing done, and so you can write your snapshot to a compressed encrypted file on a USB key if you want to, you just have to unmount it before the snapshot() syscall and remount it right afterwards. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Matthew Garrett wrote: > While that would certainly be nifty, I think we're arguably starting > from the wrong point here. Why are we booting a kernel, trying to poke > the hardware back into some sort of mock-quiescent state, freeing memory > and then (finally) overwriting the entire contents of RAM rather than > just doing all of this from the bootloader? Sure, you could make suspend generate a complete bootable kernel image containing all RAM. Doesn't sound too hard to me. You know, from over here on the sidelines. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Fri, Apr 27, 2007 at 05:18:16PM -0700, Jeremy Fitzhardinge wrote: > Then you could use kexec for resume... While that would certainly be nifty, I think we're arguably starting from the wrong point here. Why are we booting a kernel, trying to poke the hardware back into some sort of mock-quiescent state, freeing memory and then (finally) overwriting the entire contents of RAM rather than just doing all of this from the bootloader? Given the time spent in kernel setup and unpacking initramfs nowadays, I'm willing to bet it'd still be faster even if you're stuck using int 13 on x86. http://apcmag.com/5873/page14 suggests that Intel is looking into this, but I haven't heard anything more yet. To the best of my knowledge, this is also how Windows manages things. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Saturday, 28 April 2007 01:59, Linus Torvalds wrote: > > On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > > > > Actually, the less things happen while we're creating and saving the image, > > the less sources of potential problems there are and by freezing the kernel > > threads (not all of them), we cause less things to happen at that time. > > That makes no sense. > > You have to create the snapshot image with interrupts disabled *anyway*. > > I really don't see how you can say that stopping threads etc can make any > difference what-so-ever. If you don't create the snapshot with interrupts > disabled (and just with a single CPU running) you have so many other > problems that it's not even remotely funny. > > So there's *by*definition* nothing at all that can happen while you > snapshot the system. Claiming otherwise is just silly. For creating the snapshot alone, it doesn't matter. Except that the restore is cleaner a bit (we know exactly what all of these threads will be doing when we restore the image and enable the IRQs after that). Still, I think that kernel threads can potentailly hold locks accross the freezing of devices and image creation and that is fishy. Also I believe, although I'm not 100% sure, that some of them may cause problems to appear after we've created the image and while we are saving it. > > To make you happy, we could stop doing that, but what actual _advantage_ > > that would bring? > > Like getting rid of all the magic "I don't want you to freeze me" crud? And what exactly is wrong with it? > Or getting rid of this horribly idiotic "three times widdershins" kind of > black magic mentality! It looks like the main reason for the process > freezing has nothing to do with technology, but some irrational fear of > other things happening at the same time, even though they CANNOT happen if > you do things even half-way sanely. > > The "let's stop all kernel threads" is superstition. It's the same kind of > superstition that made people write "sync" three times before turning off > the power in the olden times. It's the kind of superstition that comes > from "we don't do things right, so let's be vewy vewy quiet and _pray_ > that it works when we are beign quiet". > > That's bad. Okay. Accidentally, I'm working on a freezer patch, so I'll probably drop the freezing of kernel threads from swsusp in it and we'll see what happens. Let's do the experiment, shall we? > It's doubly bad, because that idiocy has also infected s2ram. Again, > another thing that really makes no sense at all - and we do it not just > for snapshotting, but for s2ram too. Can you tell me *why*? Why we freeze tasks at all or why we freeze kernel threads? > > > Trying to freeze kernel threads has _caused_ problems. It has _added_ > > > these interdependencies. It hasn't removed a single dependency at any > > > time, it has just added new problems! > > > > What problems are you talking about? > > Like you wouldn't know. Look at commit b43376927a that you yourself are > credited with, just a month ago. > > Then, do something as simple as > > git grep create_freezeable_workthread s/workthread/workqueue/ > and ponder the end results of that grep. If you don't see something wrong, > you're blind. This was a mistake, quite unrelated to the point you're making. And actually, I was trying to fix a problem with two kernel threads that we thought might submit I/O to disk after the image had been created. Otherwise I wouldn't have thought of doing that change. > > > NONE of these are valid explanations at all. You're listing totally > > > theoretical problems, and ignoring all the _real_ problems that trying to > > > freeze kernel threads has _caused_. > > > > Example, please? > > Who do you think you are kidding? See above. Well, if someone does something in a wrong way, that need not mean the thing he was trying to do was wrong. Somehow, I knew you would point at this ... > And if you think that's an isolated example, look again. And start > grepping for PF_NOFREEZE, and other examples. May I say I'm not convinced? > The fact is, there is not a *single* reason to freeze kernel threads. But > some rocket scientist decided to, and then screwed everybody else over. At least _that_ wasn't me. :-) Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Linus Torvalds writes: > I really don't see how you can say that stopping threads etc can make any > difference what-so-ever. If you don't create the snapshot with interrupts > disabled (and just with a single CPU running) you have so many other > problems that it's not even remotely funny. I agree. I don't like the freezer. We have had working kernel-controlled suspend to RAM on powerbooks for almost 10 years now, and we never needed to freeze processes. That said, I can see two attractions in freezing processes: 1. It provides a way to stop new I/O requests coming in, and thus somewhat makes up for the lack of a way to freeze device request queues (at least, we didn't have one last time I looked). 2. Systems do sometimes die while suspended (e.g. run out of battery, or the resume process fails), and to make the next boot painless, you want the filesystems on disk to be as clean as possible. Freezing processes and then doing a sync provides one way to achieve that. Of course, you have to make sure you don't freeze any kernel threads that are needed for doing the sync... And if one of your filesystems is using FUSE, it's not going to get very far. Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[git patches] net driver fixes
As mentioned previously, the big batch queued for 2.6.22 is coming after the dust settles. [EMAIL PROTECTED] folks: the sis900 patch should be in 2.6.21.x Please pull from 'upstream-linus' branch of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git upstream-linus to receive the following updates: drivers/net/sis900.c |9 + drivers/usb/net/pegasus.c | 10 -- drivers/usb/net/pegasus.h |3 +-- 3 files changed, 6 insertions(+), 16 deletions(-) Dan Williams (1): usb-net/pegasus: simplify carrier detection Neil Horman (1): sis900: Allocate rx replacement buffer before rx operation diff --git a/drivers/net/sis900.c b/drivers/net/sis900.c index dea0126..2cb2e15 100644 --- a/drivers/net/sis900.c +++ b/drivers/net/sis900.c @@ -1753,6 +1753,7 @@ static int sis900_rx(struct net_device *net_dev) sis_priv->rx_ring[entry].cmdsts = RX_BUF_SIZE; } else { struct sk_buff * skb; + struct sk_buff * rx_skb; pci_unmap_single(sis_priv->pci_dev, sis_priv->rx_ring[entry].bufptr, RX_BUF_SIZE, @@ -1786,10 +1787,10 @@ static int sis900_rx(struct net_device *net_dev) } /* give the socket buffer to upper layers */ - skb = sis_priv->rx_skbuff[entry]; - skb_put(skb, rx_size); - skb->protocol = eth_type_trans(skb, net_dev); - netif_rx(skb); + rx_skb = sis_priv->rx_skbuff[entry]; + skb_put(rx_skb, rx_size); + rx_skb->protocol = eth_type_trans(rx_skb, net_dev); + netif_rx(rx_skb); /* some network statistics */ if ((rx_status & BCAST) == MCAST) diff --git a/drivers/usb/net/pegasus.c b/drivers/usb/net/pegasus.c index 1ad4ee5..a05fd97 100644 --- a/drivers/usb/net/pegasus.c +++ b/drivers/usb/net/pegasus.c @@ -847,16 +847,6 @@ static void intr_callback(struct urb *urb) * d[0].NO_CARRIER kicks in only with failed TX. * ... so monitoring with MII may be safest. */ - if (pegasus->features & TRUST_LINK_STATUS) { - if (d[5] & LINK_STATUS) - netif_carrier_on(net); - else - netif_carrier_off(net); - } else { - /* Never set carrier _on_ based on ! NO_CARRIER */ - if (d[0] & NO_CARRIER) - netif_carrier_off(net); - } /* bytes 3-4 == rx_lostpkt, reg 2E/2F */ pegasus->stats.rx_missed_errors += ((d[3] & 0x7f) << 8) | d[4]; diff --git a/drivers/usb/net/pegasus.h b/drivers/usb/net/pegasus.h index c7aadb4..c746782 100644 --- a/drivers/usb/net/pegasus.h +++ b/drivers/usb/net/pegasus.h @@ -11,7 +11,6 @@ #definePEGASUS_II 0x8000 #defineHAS_HOME_PNA0x4000 -#defineTRUST_LINK_STATUS 0x2000 #definePEGASUS_MTU 1536 #defineRX_SKBS 4 @@ -204,7 +203,7 @@ PEGASUS_DEV( "AEI USB Fast Ethernet Adapter", VENDOR_AEILAB, 0x1701, PEGASUS_DEV( "Allied Telesyn Int. AT-USB100", VENDOR_ALLIEDTEL, 0xb100, DEFAULT_GPIO_RESET | PEGASUS_II ) PEGASUS_DEV( "Belkin F5D5050 USB Ethernet", VENDOR_BELKIN, 0x0121, - DEFAULT_GPIO_RESET | PEGASUS_II | TRUST_LINK_STATUS ) + DEFAULT_GPIO_RESET | PEGASUS_II ) PEGASUS_DEV( "Billionton USB-100", VENDOR_BILLIONTON, 0x0986, DEFAULT_GPIO_RESET ) PEGASUS_DEV( "Billionton USBLP-100", VENDOR_BILLIONTON, 0x0987, - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Fri, 27 Apr 2007, David Lang wrote: > > all that's needed for the snapshot is to prevent userspace from scheduling, Strictly speaking, all you *really* want to make sure is not so much that user-space isn't scheduling, as the fact that all device IO buffers must be empty. We can trivially snapshot an active user-space, and in fact it would probably be hard to do a snapshot in a way that it could even *know* or care about whether there are user-space processes running at the time of the snapshot. So that's not the real problem. What we obviously *cannot* snapshot is if some particular device is in the middle of being written to or read from, and has outstanding commands on the device itself (as opposed to just queued to the driver). So what we do want to make sure happens is that there are no IO queues that are active. And the best way to make sure that there are no IO queues active is to make sure that there are no new read or write-requests. And *that* you can do two ways: - actually intercepting the read/write requests. Probably not too hard, we could literally do it in the IO scheduler (and probably much more easily than doing it in the process scheduler), but the easy cases will only cover the block device layer, and character devices don't have the same kind of scheduler you can trap IO in. - we also don't want to generate new data that needs to be snapshotted, so we want to trap people who write even just to the page cache and turn pages dirty. Again, we could probably do it at *that* point (ie trapping them when they try to dirty a page), and it would be more logical, but again, there are other cases of people who generate more data (just any memory allocation obviously is a special case of generating more data to be snapshotted), so I do agree that we want to stop producing new data to be snapshotted, and we want to stop producing new read-requests. But kernel threads really do neither: in an idle system, kernel threads are idle too. A kernel thread is not like a user program that actually generates data - they only tend to act on behalf of other processes' needs. So I think that what snapshotting really *wants* to stop is not schedulign per se, but IO. And stopping user processes (as opposed to kernel threads) is probably a good way to get there. In fact, I'd argue that you want to stop user space and then encourage some kernel threads to *start* running, notably things like bdflush should probably be kicked to clean up some dirty stuff as part of the "shrink data to be snapshotted" part. Trying to free memory will do that on its own, of course. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/memory.c: remove warning from an uninitialized spinlock. was: Re: 2.6.21-rc7-mm2
Andrew Morton wrote: > --- > a/mm/memory.c~add-apply_to_page_range-which-applies-a-function-to-a-pte-range-fix > +++ a/mm/memory.c > @@ -1455,7 +1455,7 @@ static int apply_to_pte_range(struct mm_ > pte_t *pte; > int err; > struct page *pmd_page; > - spinlock_t *ptl; > + spinlock_t *ptl = ptl; /* Suppress gcc warning */ > Sigh. I guess so. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PATCH] UIO patches for 2.6.21
Am Samstag 28 April 2007 01:04 schrieb Andrew Morton: > On Fri, 27 Apr 2007 15:49:57 -0700 > > Greg KH <[EMAIL PROTECTED]> wrote: > > Here are the updated UIO (Userspace I/O driver framework) patches for > > 2.6.21. > > I'm a bit uncertain about the whole UIO idea, really. I have this vague > feeling that we'd prefer to encourage people to move device drivers into > GPL'ed kernel rather than encouraging them to do closed-source userspace > implementations which will probably end up being slower, less reliable and > unavailable on various architectures, distros, etc. > > But I don't think I have the capacity to actually think about this further > - just tossing it out there ;) Thanks for tossing it out ;-) I understand your uncertainty and I share your opinion about encouraging industry developers to GPL their drivers. It really took me some time until I understood that sometimes there are _good_ reasons for a closed driver. UIO is not intended for mass products like graphic cards. We're talking about companies who developed special hardware for use in special applications like machine control. They sometimes need to keep a part of their driver closed, at least for some time. Sometimes it's because they want to protect themselves, sometimes because their customer demands it. Usually, they know about the disadvantages you mentioned (if they're our customers, be sure we tell them!). Anyway, UIO is not just a system to allow closed drivers. There are enough other reasons why these industry developers want userspace drivers. The most important one is that they're usually no experienced kernel developers. They can let somebody write the kernel part for them, and then write their driver using the tools and libraries they know, with floating point and all that stuff. It's just convenient. If I had to write a driver for a fieldbus card today, I'd use UIO. And I'd make it free software. UIO doesn't force anybody to close his drivers. > > > They have been revamped from the last time you have seen them, and they > > include a real driver, the Hilscher CIF DeviceNet and Profibus card > > controller, which is being used in production systems with this driver > > framework right now. The kernel driver they replaced was a total mess, > > with over 60+ ioctls to try to control the different aspects of the > > device. See the last patch in this series for more details on this > > driver. > > > > These patches include full documentation, are self-contained from the > > rest of the kernel, and have been in the -mm tree for the past few > > months with no complaints. > > > > Please pull from: > > master.kernel.org:/pub/scm/linux/kernel/git/gregkh/uio-2.6.git/ > > > > Patches will be sent as a follow-on to this message to lkml for people > > to see. > > > > drivers/uio/uio_cif.c | 156 > > eh? How come a particular device requires 156 lines of kernel code to > support a userspace driver? Doesn't that kind of defeat the point? This is quite a large kernel module for an UIO device due to quite stupid hardware design. It needs two memory mappings, and the interrupt handler is not the simplest thing possible. BTW, I don't think that 156 lines is so much. It allows to handle quite a complex PCI card. And it's so simple that it can be even explained to industry programmers who are no kernel gurus. Thanks, Hans - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Sat, 28 Apr 2007, Nigel Cunningham wrote: Hi. On Sat, 2007-04-28 at 01:45 +0200, Rafael J. Wysocki wrote: On Saturday, 28 April 2007 01:17, Linus Torvalds wrote: On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: And can you name a _single_ advantage of doing so? Yes. We have a lot less interdependencies to worry about during the whole operation. That's not an advantage. That's why it has *sucked*. Actually, the less things happen while we're creating and saving the image, the less sources of potential problems there are and by freezing the kernel threads (not all of them), we cause less things to happen at that time. To make you happy, we could stop doing that, but what actual _advantage_ that would bring? A couple of other advantages to freezing other processes: 1) It makes predicting how much memory is available for making and saving snapshot a tractable problem. It therefore makes hibernation _much_ more reliable. 2) Racing against other processes would also make hibernation slower, increasing the chances of your battery running out before the save is complete. 3) It makes finding potential memory leaks in the code possible. It was ages ago now, but at one stage I could display a table saying exactly how many pages had been allocated and freed by different sections of the process and compare the number of free pages at the start and end of the cycle to ensure there were no memory leaks at all. nobody is suggesting that you leave peocesses running while you do the snapshot, what is being proposed is 1. pause userspace (prevent scheduling) 2. make snapshot image of memory 3. make mounted filesystems read-only (possibly with snapshot/checkpoint) 4. unpause 5. save image (with full userspace available, including network) 6. shutdown system (throw away all userspace memory, no need to do graceful shutdown or nice kill signals, revert filesystem to snapshot/checkpoint if needed) NONE of these are valid explanations at all. You're listing totally theoretical problems, and ignoring all the _real_ problems that trying to freeze kernel threads has _caused_. Example, please? I agree with Rafael. Freezing processes greatly helps in ensuring we have a consistent image. He's right, too, in asserting that it's even more important for Suspend2. Freezing processes is essential to being able to know that those LRU pages won't change and therefore being able to save them separately and then reuse them for the atomic copy. all that's needed for the snapshot is to prevent userspace from scheduling, and prevent media from being written to in a permanent way (writing to a LVM volume after invoking a snapshot doesn't count, just revert to the snapshot) David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/memory.c: remove warning from an uninitialized spinlock. was: Re: 2.6.21-rc7-mm2
On Thu, 26 Apr 2007 20:25:19 +0200 Borislav Petkov <[EMAIL PROTECTED]> wrote: > > Remove build warning mm/memory.c:1491: warning: 'ptl' may be used > uninitialized in this function. > The spinlock pointer is assigned to null since it gets overwritten right away > in > pte_alloc_map_lock(). > > Signed-off-by: Borislav Petkov <[EMAIL PROTECTED]> > --- > > Index: linux-mm/mm/memory.c > === > --- linux-mm.orig/mm/memory.c2007-04-26 19:57:14.0 +0200 > +++ linux-mm/mm/memory.c 2007-04-26 20:00:30.0 +0200 > @@ -1488,7 +1488,7 @@ > pte_t *pte; > int err; > struct page *pmd_page; > - spinlock_t *ptl; > + spinlock_t *ptl = NULL; > > pte = (mm == _mm) ? > pte_alloc_kernel(pmd, addr) : > yes, I've been staring unhappily at this for some time. Your change adds seven bytes of text to this function for no runtime benefit, just to fix a build-time warning. It's a general problem. Often we just leave the warning in place and curse gcc each time it flies past. Sometimes the code can be restructured in a sensible fashion to avoid the warning; often it cannot. But I don't think I want to put up with a warning coming out of core MM all the time so let's go with the following silliness which adds no additional runtime cost. --- a/mm/memory.c~add-apply_to_page_range-which-applies-a-function-to-a-pte-range-fix +++ a/mm/memory.c @@ -1455,7 +1455,7 @@ static int apply_to_pte_range(struct mm_ pte_t *pte; int err; struct page *pmd_page; - spinlock_t *ptl; + spinlock_t *ptl = ptl; /* Suppress gcc warning */ pte = (mm == _mm) ? pte_alloc_kernel(pmd, addr) : _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Fri, 27 Apr 2007, Linus Torvalds wrote: > > The "let's stop all kernel threads" is superstition. It's the same kind of > superstition that made people write "sync" three times before turning off > the power in the olden times. It's the kind of superstition that comes > from "we don't do things right, so let's be vewy vewy quiet and _pray_ > that it works when we are beign quiet". Side note: while I think things should probably *work* even with user processes going full bore while a snapshot it taken, I'll freely admit that I'll follow that superstition far enough that I think it's probably a good idea to try to quiesce the system to _some_ degree, and that stopping user programs is a good idea. Partly because the whole memory shrinking thing, and partly just because we should do the snapshot with hw IO queues empty. But I don't think it would necessarily be wrong (and in many ways it would probably be *right*) to do that IO queue stopping at the queue level rather than at a process level. Why stop processes just becasue you want to clean out IO queues? They are two totally different things! Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Linus Torvalds wrote: > On Fri, 27 Apr 2007, Rafael J. Wysocki wrote: > >> Why do you think that keeping the user space frozen after 'snapshot' is a bad >> idea? I think that solves many of the problems you're discussing. >> > > It makes it harder to debug (wouldn't it be *nice* to just ssh in, and do > > gdb -p > > when something goes wrong?) Yeah, or gdb vmlinux snapshot Then you could use kexec for resume... J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] ehea: fix for sysfs entries
Thomas Klein wrote: Create symbolic link from each logical port to ehea driver Signed-off-by: Thomas Klein <[EMAIL PROTECTED]> --- This patch applies on top of the netdev upstream branch for 2.6.22 applied 1-2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 -- x86_64 blade hard hangs
On Fri, 27 Apr 2007, Siddha, Suresh B wrote: On Fri, Apr 27, 2007 at 12:07:10PM +0100, Mel Gorman wrote: On (26/04/07 16:40), Siddha, Suresh B didst pronounce: oops. Appended patch should fix this. Can you please check this and Ack it? This patch does not apply cleanly to 2.6.21-rc7-mm2. Mel, Please backout the existing x86_64-set-node_possible_map-at-runtime.patch in rc7-mm2 and apply the appended patch instead. I backed out broken-out/x86_64-set-node_possible_map-at-runtime.patch broken-out/x86_64-set-node_possible_map-at-runtime-fix.patch broken-out/x86_64-set-node_possible_map-at-runtime-fix-2.patch and dropped in your new patch. It passed boot tests on the machine in question, so just from a testing perspective Acked-by: Mel Gorman <[EMAIL PROTECTED]> Andrew, as you already backedout x86_64-set-node_possible_map-at-runtime.patch from your -mm series, please include the appended patch (as try 2), after Mel confirms that it works fine on his setup. Thanks! Thank you. --- From: Suresh Siddha <[EMAIL PROTECTED]> [patch] x86_64: set node_possible_map at runtime - try 2 Set the node_possible_map at runtime on x86_64. On a non NUMA system, num_possible_nodes() will now say '1'. Signed-off-by: Suresh Siddha <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: Eric Dumazet <[EMAIL PROTECTED]> Cc: David Rientjes <[EMAIL PROTECTED]> Cc: Christoph Lameter <[EMAIL PROTECTED]> --- diff -pNru linux/arch/x86_64/mm/k8topology.c linux~/arch/x86_64/mm/k8topology.c --- linux/arch/x86_64/mm/k8topology.c 2007-04-27 10:37:19.0 -0700 +++ linux~/arch/x86_64/mm/k8topology.c 2007-04-27 10:34:10.0 -0700 @@ -49,11 +49,8 @@ int __init k8_scan_nodes(unsigned long s int found = 0; u32 reg; unsigned numnodes; - nodemask_t nodes_parsed; unsigned dualcore = 0; - nodes_clear(nodes_parsed); - if (!early_pci_allowed()) return -1; @@ -102,7 +99,7 @@ int __init k8_scan_nodes(unsigned long s nodeid, (base>>8)&3, (limit>>8) & 3); return -1; } - if (node_isset(nodeid, nodes_parsed)) { + if (node_isset(nodeid, node_possible_map)) { printk(KERN_INFO "Node %d already present. Skipping\n", nodeid); continue; @@ -155,7 +152,7 @@ int __init k8_scan_nodes(unsigned long s prevbase = base; - node_set(nodeid, nodes_parsed); + node_set(nodeid, node_possible_map); } if (!found) diff -pNru linux/arch/x86_64/mm/numa.c linux~/arch/x86_64/mm/numa.c --- linux/arch/x86_64/mm/numa.c 2007-04-27 10:37:19.0 -0700 +++ linux~/arch/x86_64/mm/numa.c2007-04-27 10:34:10.0 -0700 @@ -298,7 +298,7 @@ static int __init setup_node_range(int n ret = -1; } nodes[nid].end = *addr; - node_set_online(nid); + node_set(nid, node_possible_map); printk(KERN_INFO "Faking node %d at %016Lx-%016Lx (%LuMB)\n", nid, nodes[nid].start, nodes[nid].end, (nodes[nid].end - nodes[nid].start) >> 20); @@ -482,7 +482,7 @@ out: * SRAT. */ remove_all_active_ranges(); - for_each_online_node(i) { + for_each_node_mask(i, node_possible_map) { e820_register_active_regions(i, nodes[i].start >> PAGE_SHIFT, nodes[i].end >> PAGE_SHIFT); setup_node_bootmem(i, nodes[i].start, nodes[i].end); @@ -497,20 +497,25 @@ void __init numa_initmem_init(unsigned l { int i; + nodes_clear(node_possible_map); + #ifdef CONFIG_NUMA_EMU if (cmdline && !numa_emulation(start_pfn, end_pfn)) return; + nodes_clear(node_possible_map); #endif #ifdef CONFIG_ACPI_NUMA if (!numa_off && !acpi_scan_nodes(start_pfn << PAGE_SHIFT, end_pfn << PAGE_SHIFT)) return; + nodes_clear(node_possible_map); #endif #ifdef CONFIG_K8_NUMA if (!numa_off && !k8_scan_nodes(start_pfn< -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > > Actually, the less things happen while we're creating and saving the image, > the less sources of potential problems there are and by freezing the kernel > threads (not all of them), we cause less things to happen at that time. That makes no sense. You have to create the snapshot image with interrupts disabled *anyway*. I really don't see how you can say that stopping threads etc can make any difference what-so-ever. If you don't create the snapshot with interrupts disabled (and just with a single CPU running) you have so many other problems that it's not even remotely funny. So there's *by*definition* nothing at all that can happen while you snapshot the system. Claiming otherwise is just silly. > To make you happy, we could stop doing that, but what actual _advantage_ > that would bring? Like getting rid of all the magic "I don't want you to freeze me" crud? Or getting rid of this horribly idiotic "three times widdershins" kind of black magic mentality! It looks like the main reason for the process freezing has nothing to do with technology, but some irrational fear of other things happening at the same time, even though they CANNOT happen if you do things even half-way sanely. The "let's stop all kernel threads" is superstition. It's the same kind of superstition that made people write "sync" three times before turning off the power in the olden times. It's the kind of superstition that comes from "we don't do things right, so let's be vewy vewy quiet and _pray_ that it works when we are beign quiet". That's bad. It's doubly bad, because that idiocy has also infected s2ram. Again, another thing that really makes no sense at all - and we do it not just for snapshotting, but for s2ram too. Can you tell me *why*? > > Trying to freeze kernel threads has _caused_ problems. It has _added_ > > these interdependencies. It hasn't removed a single dependency at any > > time, it has just added new problems! > > What problems are you talking about? Like you wouldn't know. Look at commit b43376927a that you yourself are credited with, just a month ago. Then, do something as simple as git grep create_freezeable_workthread and ponder the end results of that grep. If you don't see something wrong, you're blind. > > NONE of these are valid explanations at all. You're listing totally > > theoretical problems, and ignoring all the _real_ problems that trying to > > freeze kernel threads has _caused_. > > Example, please? Who do you think you are kidding? See above. And if you think that's an isolated example, look again. And start grepping for PF_NOFREEZE, and other examples. The fact is, there is not a *single* reason to freeze kernel threads. But some rocket scientist decided to, and then screwed everybody else over. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Saturday, 28 April 2007 01:01, David Lang wrote: > On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > > > On Saturday, 28 April 2007 00:26, David Lang wrote: > >> On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > >> > > We're freezing many of them just fine. ;-) > > And can you name a _single_ advantage of doing so? > >>> > >>> Yes. We have a lot less interdependencies to worry about during the whole > >>> operation. > >>> > It so happens, that most people wouldn't notice or care that kmirrord got > frozen (kernel thread picked at random - it might be one of the threads > that has gotten special-cased to not do that), but I have yet to hear a > single coherent explanation for why it's actually a good idea in the > first > place. > >>> > >>> Well, I don't know if that's a 'coherent' explanation from your point of > >>> view > >>> (probably not), but I'll try nevertheless: > >>> 1) if the kernel threads are frozen, we know that they don't hold any > >>> locks > >>> that could interfere with the freezing of device drivers, > >> > >> does teh process of freezing really wait until all locks have been > >> released? > > > > Yes, it does. > > > >>> 2) if they are frozen, we know, for example, that they won't call user > >>> mode > >>> helpers or do similar things, > >> > >> this won't matter unless the user mode helpers are going to do I/O or other > >> permanent changes > > > > Please note that even accessing a file may be a permanent change. > > if accessing a file on a read-only filesystem changes that filesystem it's a > bug > > see the recent thread about ext3 journal replays when mounting read-only as > an > example. Oh well. Is this really wrong to protect users from such bugs, if we can do that? > >>> 3) if they are frozen, we know that they won't submit I/O to disks and > >>> potentially damage filesystems (suspend2 has much more problems with that > >>> than swsusp, but still. And yes, there have been bug reports related to > >>> it, > >>> so it's not just my fantasy). > >> > >> if you have the filesystems checkpointed then I/O after the freeze won't > >> matter > >> as you just revert to the checkpoint (and since this is going to be thrown > >> away > >> it can stay in ram) > > > > In that case, I would agree. Currently, however, we're not even close to > > this > > point. > > > > The checkpointing of filesystems would be a very welcome feature, but > > there's > > no anyone working on it right now, AFAICT. > > > >> if we are willing to make a break with the past to implement the new > >> snapshot > >> capability, we should be able to use the LVM snapshot code to handle the > >> filesystem > > > > Yes, we can do that, in principle, and screw all of the current users in the > > process. And finally we'd end up with something similar to what is done > > now, > > IMHO. > > however, the result may be a lot less 'special case pwoer management' code > and a Are you referring to some specific code? > lot more re-use of code that's in place for other uses. This already is happening. > if work on the current versions was stopped (other then trying to avoid > regressions) and a new version (with new userspace tools) was built in a way > that satisfies everyone the old version could be phased out in a year or two > (per the normal feture removal process) May I say it's not realistic? > > And no, the things are not just totally broken, as it may follow from these > > discussions. The problem is that the people who are discussing them so > > viciously have never tried to write anything like the hibernation code. > > > > This is as though as I were discussing the design of the CPU schedulers, > > although I only know how they work on a general level. > > > > Actually, the really problematic thing with the hibernation _right_ _now_ is > > what Linus is so concerned about (and rightfully so) - that we use the > > same device drivers' callbacks for the hibernation and suspend (aka s2ram). > > The other things work quite well and are really robust. > > if simply splitting the functions cleans everything up enough to satisfy > everyone then we're almost done right? ;-) Practically, yes. Theoretically, there's no software you can't improve (except, probably, TeX), but that might not be worth the effort. > however I think that there are other fundamental disagreements here, and > neither > the 'do absolutly everything in the kernel' or the 'do almost nothing in the > kernel' approaches are going to fly in the long run. I think we'll have an agreement, though. > I think the userspace<->kernel interface is going to be different then > either apprach is doing now, You're probably right > and as such it's an oppurtunity to make more drastic changes if they are > appropriate. Well, maybe. > for example, why should we have LVM snapshot code and hibernate > snapshot/filesystem checkpoint code instead of just useing the LVM code >
Re: BAD_SG_DMA panic in aha1542
Alan Cox wrote: > > As before, no problems using the sda hard disk (which is the boot drive): > > everything works reliably until I touch the cdrom drive. > > A little quiet contemplation and gnome number 387 suggests trying the > following > (and providing more detailed information such as the last message printed > before > the DMA message). Stuff a BUG() before the panic in BAD_DMA (aha1542.c) if > needed > to get a good trace. > > Please report success/failure/change. Can do. I don't have access to the machine on weekends, so it will be at least Monday before I can give this a whirl. Thanks! -- --- Bob Tracy WTO + WIPO = DMCA? http://www.anti-dmca.org [EMAIL PROTECTED] --- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi. On Sat, 2007-04-28 at 01:45 +0200, Rafael J. Wysocki wrote: > On Saturday, 28 April 2007 01:17, Linus Torvalds wrote: > > > > On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > > > > > > > And can you name a _single_ advantage of doing so? > > > > > > Yes. We have a lot less interdependencies to worry about during the whole > > > operation. > > > > That's not an advantage. That's why it has *sucked*. > > Actually, the less things happen while we're creating and saving the image, > the less sources of potential problems there are and by freezing the kernel > threads (not all of them), we cause less things to happen at that time. > > To make you happy, we could stop doing that, but what actual _advantage_ > that would bring? A couple of other advantages to freezing other processes: 1) It makes predicting how much memory is available for making and saving snapshot a tractable problem. It therefore makes hibernation _much_ more reliable. 2) Racing against other processes would also make hibernation slower, increasing the chances of your battery running out before the save is complete. 3) It makes finding potential memory leaks in the code possible. It was ages ago now, but at one stage I could display a table saying exactly how many pages had been allocated and freed by different sections of the process and compare the number of free pages at the start and end of the cycle to ensure there were no memory leaks at all. > > Trying to freeze kernel threads has _caused_ problems. It has _added_ > > these interdependencies. It hasn't removed a single dependency at any > > time, it has just added new problems! > > What problems are you talking about? > > > > 1) if the kernel threads are frozen, we know that they don't hold any > > > locks > > > that could interfere with the freezing of device drivers, > > > 2) if they are frozen, we know, for example, that they won't call user > > > mode > > > helpers or do similar things, > > > 3) if they are frozen, we know that they won't submit I/O to disks and > > > potentially damage filesystems (suspend2 has much more problems with that > > > than swsusp, but still. And yes, there have been bug reports related to > > > it, > > > so it's not just my fantasy). > > > > NONE of these are valid explanations at all. You're listing totally > > theoretical problems, and ignoring all the _real_ problems that trying to > > freeze kernel threads has _caused_. > > Example, please? I agree with Rafael. Freezing processes greatly helps in ensuring we have a consistent image. He's right, too, in asserting that it's even more important for Suspend2. Freezing processes is essential to being able to know that those LRU pages won't change and therefore being able to save them separately and then reuse them for the atomic copy. > > If you want to control user-mode helpers, you do that - you do not freeze > > kernel threads! > > > > And no, kernel threads do not submit IO to disks on their own. You just > > made that up. > > No, I didn't. Nigel can confirm, I think. I have had problems with MD threads generating I/O that I couldn't account for - after userspace had been frozen, filesystems had been nicely synced and so on. I have to speak with reservations though, because I haven't yet gotten to the bottom of where the I/O is coming from... too many things, too small time slices. > > Yes, they can be involved in that whole disk submission thing, but in a good > > way - they can be required in order to make disk writing work! > > Some of them can be, some other's need not be. We don't need any fs-related > kernel threads for saving the image, for example. Yeah, so long as we bmap the storage we want to use beforehand (thinking of swap files and ordinary files). > > The problem that suspend has had is that it's done everything totally the > > wrong way around. Do kernel threads do disk IO? Sure, if asked to do so. > > They can be asked before we do the snapshot and complete the operation > afterwards, no? > > > For example, kernel threads can be involved in md etc, but that's a *good* > > thing. > > We don't freeze these threads. > > > The way to shut them up is not to freeze the threads, but to freeze the > > *disk*. > > In principle, you're right. In practice, go and try it. I have to disagree here. Freezing the disk instead of the threads is dealing with the symptoms instead of the cause. Regards, Nigel signature.asc Description: This is a digitally signed message part
Re: 2.6.21-rc7: known regressions
On Sat, Apr 28, 2007 at 12:38:07AM +0200, Michal Piotrowski wrote: > Hi all, > > Here is a list of known regressions reported after 2.6.21 release. if this was also on a wiki page... 1) contributors (also casual ones) may update it or add new entries 2) adding a "Forwarded-To:" field and a "renew" button, regression reports could be fired semi-automatically to the right recipients. also the casual reader might bug proper maintainer simply clicking on the button. grave regressions would get more clicks... 3) when the new release is cut, such page is converted and saved as known regression list 4) a mail filter on lkml could perform some bookkeeping so people hating web could simply drop a message and the wiki page could update itself (no, not abuse itself) 5) web lovers could simply click on the links to lkml to dig into the regression it looks like a simple and rudimentary bug tracker, but it is only a regression reminder with links in the lkml flow. a distilled human-driven regression-oriented semi-automatic lkml archive. yes, some smart hybrid wiki/php|python thing with a db which could be used to automatically send regression reminders.. i'm not a web developer, i'm not able to suggest the right wiki-tool. i could offer some space on a tiny server with bandwidth but without any email capability. > Feel free to add new regressions/remove fixed etc. > http://lkr.wikidot.com/list what?!? it is already on a wiki page? doh! then read only the other non-wiki ideas... 'night domenico -[ Domenico Andreoli, aka cavok --[ http://www.dandreoli.com/gpgkey.asc ---[ 3A0F 2F80 F79C 678A 8936 4FEE 0677 9033 A20E BC50 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BAD_SG_DMA panic in aha1542
James Bottomley wrote: > On Fri, 2007-04-27 at 16:47 -0500, Bob Tracy wrote: > > I previously reported an ISA DMA issue for the 2.6.12 kernel. The issue > > persists through at least 2.6.18. SCSI controller is an Adaptec > > AHA-1542B (ISA). > > > > The action "mount -t iso9660 /dev/scd0 /mnt/cdrom -r" > > > > produces > > > > (cdrom detection messages as various modules autoload, then...) > > Knowing what these messages are is would be helpful; it tells me what > point in the initialisation it got to. Sorry about that... I'm running the DSL-N distribution (based on Knoppix), and having to transcribe the log messages by hand from the console, i.e., there's no logfile to cut-and-paste from :-(. I don't have access to the machine except on weekdays, but I'll repeat the crash first thing Monday morning and copy everything that's there... > I'm interested. > > This is clearly a use_sg==1 path that has failed to bounce the buffer > for some reason ... and I was contemplating eliminating the GFP_DMA from > our sr driver because I thought the block bouncing had it covered. > > It might also be helpful to apply this patch. It should give a stack > trace of the problem command and not immediately panic the box. I'll throw together a 2.6.21 kernel with this patch and give it a try. Again, it will be at least Monday before you hear back from me on this. Thanks! -- --- Bob Tracy WTO + WIPO = DMCA? http://www.anti-dmca.org [EMAIL PROTECTED] --- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Question about Reiser4 (how to boot it?)
Thanks, that is certainly helpful, but that only mounts one directory (partition) as Reiser4. This I have already done. I was more interested in how to have a whole partition dedicated to Reiser4 and being able to boot into it. By any chance did you do that? On Sat, 28 Apr 2007 00:37:05 +0800, "Jeff Chua" <[EMAIL PROTECTED]> said: > On 4/27/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > > Hi Jeff, could you outline the procedure that YOU used to get Reiser4 > > installed and running. > > Pretty much the same as the steps from ... > http://linuxhelp.150m.com/installs/compile-kernel.htm > > cd /usr/src > tar --use=bzip2 -xpf linux-2.6.21.tar.bz2 > ln -nsf linux-2.6.21 linux > cd /usr/src/linux > bzip2 -d -c /tmp/reiser4-for-2.6.20.patch.bz2 | patch -p1 > # copy your old .config here > make menuconfig > File systems ---> > <*> Reiser4 (EXPERIMENTAL) > make > make modules_install > # copy ./i386/boot/bzImage to the boot directory > # reboot > > > # download, compile and install ... > libaal-1.0.5.tar.gz > reiser4progs-1.0.6.tar.gz > > I got them from ftp://ftp.namesys.com/pub/reiser4progs/ > > Take an unused partition, and create reiser4fs on it... > > mkfs.reiser4 /dev/sda8 > mount /dev/sda8 /mnt > > Or you may want to try it on a loop device ... > > dd if=/dev/zero of=disk1 bs=1024k count=100 > mkfs.reiser4 -yf disk1 > mount -o loop disk1 /u0 > > Here's an entry in /etc/fstab > /dev/sda8/u3reiser4noatime 0 0 > > > I hope this is good enough to get you started. > > Thanks, > Jeff. -- [EMAIL PROTECTED] -- http://www.fastmail.fm - And now for something completely different - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux-2.6.21 hangs during post boot initialization phase
Neil Horman wrote: On Sat, Apr 28, 2007 at 12:28:28AM +1000, Peter Williams wrote: Neil Horman wrote: On Fri, Apr 27, 2007 at 04:05:11PM +1000, Peter Williams wrote: Damn, This is what happens when I try to do things too quickly. I missed one spot in my last patch where I replaced skb with rx_skb. Its not critical, but it should improve sis900 performance by quite a bit. This applies on top of the last two patches. Sorry about that. Thanks & Regards Neil Signed-off-by: Neil Horman <[EMAIL PROTECTED]> sis900.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/sis900.c b/drivers/net/sis900.c index 7e44939..db59dce 100644 --- a/drivers/net/sis900.c +++ b/drivers/net/sis900.c @@ -1790,7 +1790,7 @@ static int sis900_rx(struct net_device *net_dev) /* give the socket buffer to upper layers */ rx_skb = sis_priv->rx_skbuff[entry]; skb_put(rx_skb, rx_size); - skb->protocol = eth_type_trans(rx_skb, net_dev); + rx_skb->protocol = eth_type_trans(rx_skb, net_dev); netif_rx(rx_skb); /* some network statistics */ My system also boots OK after I add this patch. Can't tell whether it's improved the performance or not. Peter -- Peter Williams [EMAIL PROTECTED] "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] battery2 git repository
On Fri, 27 Apr 2007 03:29:02 +0400 Anton Vorontsov <[EMAIL PROTECTED]> wrote: > You can get it using "git clone --reference linux-2.6 \ > git://git.infradead.org/users/cbou/battery2-2.6.git" command. I added this to the -mm lineup. Welcome to git. This means that nobody looks at your code any more and you get free rein to experiment with interesting innovations in VFS, MM and security in the mainline kernel (well, not really - Linus does squint at the diffstat). But we do have a general problem that code which travels the developer->git->mainline route is not getting sufficient review. Please be aware of this, and be as pushy as you like in sending your changes out to mailing lists (including linux-kernel) to get them reviewed. If you don't think they have received adequate review then send them again, and shout at people - we'd all admire that. Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Saturday, 28 April 2007 01:17, Linus Torvalds wrote: > > On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > > > > > And can you name a _single_ advantage of doing so? > > > > Yes. We have a lot less interdependencies to worry about during the whole > > operation. > > That's not an advantage. That's why it has *sucked*. Actually, the less things happen while we're creating and saving the image, the less sources of potential problems there are and by freezing the kernel threads (not all of them), we cause less things to happen at that time. To make you happy, we could stop doing that, but what actual _advantage_ that would bring? > Trying to freeze kernel threads has _caused_ problems. It has _added_ > these interdependencies. It hasn't removed a single dependency at any > time, it has just added new problems! What problems are you talking about? > > 1) if the kernel threads are frozen, we know that they don't hold any locks > > that could interfere with the freezing of device drivers, > > 2) if they are frozen, we know, for example, that they won't call user mode > > helpers or do similar things, > > 3) if they are frozen, we know that they won't submit I/O to disks and > > potentially damage filesystems (suspend2 has much more problems with that > > than swsusp, but still. And yes, there have been bug reports related to it, > > so it's not just my fantasy). > > NONE of these are valid explanations at all. You're listing totally > theoretical problems, and ignoring all the _real_ problems that trying to > freeze kernel threads has _caused_. Example, please? > If you want to control user-mode helpers, you do that - you do not freeze > kernel threads! > > And no, kernel threads do not submit IO to disks on their own. You just > made that up. No, I didn't. Nigel can confirm, I think. > Yes, they can be involved in that whole disk submission thing, but in a good > way - they can be required in order to make disk writing work! Some of them can be, some other's need not be. We don't need any fs-related kernel threads for saving the image, for example. > The problem that suspend has had is that it's done everything totally the > wrong way around. Do kernel threads do disk IO? Sure, if asked to do so. They can be asked before we do the snapshot and complete the operation afterwards, no? > For example, kernel threads can be involved in md etc, but that's a *good* > thing. We don't freeze these threads. > The way to shut them up is not to freeze the threads, but to freeze the > *disk*. In principle, you're right. In practice, go and try it. Anyway, why is it so important that _all_ of the kernel threads be running while the snapshot is created and saved? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] utimensat implementation
On Fri, 27 Apr 2007, H. Peter Anvin wrote: The main use of atime seems to be to figure out when something can be automatically deleted. Anyone else have other usage scenarios? as a varient of this, I use it to help determine what files are actually needed when building a chroot sandbox. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/