Re: [vdpa_sim_net] 79991caf52: net/ipv4/ipmr.c:#RCU-list_traversed_in_non-reader_section
On 2/7/21 12:15 PM, Dongli Zhang wrote: > Is it possible that the issue is not due to this change? Looks this issue does not related your change, from dmesg output, when issue occurred, virtio was not loaded: [ 502.508450] [ cut here ] [ 502.511859] WARNING: CPU: 0 PID: 1 at drivers/gpu/drm/vkms/vkms_crtc.c:21 vkms_vblank_simulate+0x22a/0x240 [ 502.524018] Modules linked in: [ 502.539642] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.11.0-rc4-8-g79991caf5202 #1 > > This change is just to call different API to allocate memory, which is > equivalent to kzalloc()+vzalloc(). > > Before the change: > > try kzalloc(sizeof(*vs), GFP_KERNEL | __GFP_NOWARN | __GFP_RETRY_MAYFAIL); > > ... and then below if the former is failed. > > vzalloc(sizeof(*vs)); > > > After the change: > > try kmalloc_node(size, FP_KERNEL|GFP_ZERO|__GFP_NOWARN|__GFP_NORETRY, node); > > ... and then below if the former is failed > > __vmalloc_node(size, 1, GFP_KERNEL|GFP_ZERO, node, > __builtin_return_address(0)); > > > The below is the first WARNING in uploaded dmesg. I assume it was called > before > to open /dev/vhost-scsi. > > Will this test try to open /dev/vhost-scsi? > > [5.095515] = > [5.095515] WARNING: suspicious RCU usage > [5.095515] 5.11.0-rc4-8-g79991caf5202 #1 Not tainted > [5.095534] - > [5.096041] security/smack/smack_lsm.c:351 RCU-list traversed in non-reader > section!! > [5.096982] > [5.096982] other info that might help us debug this: > [5.096982] > [5.097953] > [5.097953] rcu_scheduler_active = 1, debug_locks = 1 > [5.098739] no locks held by kthreadd/2. > [5.099237] > [5.099237] stack backtrace: > [5.099537] CPU: 0 PID: 2 Comm: kthreadd Not tainted > 5.11.0-rc4-8-g79991caf5202 #1 > [5.100470] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.12.0-1 04/01/2014 > [5.101442] Call Trace: > [5.101807] dump_stack+0x15f/0x1bf > [5.102298] smack_cred_prepare+0x400/0x420 > [5.102840] ? security_prepare_creds+0xd4/0x120 > [5.103441] security_prepare_creds+0x84/0x120 > [5.103515] prepare_creds+0x3f1/0x580 > [5.103515] copy_creds+0x65/0x480 > [5.103515] copy_process+0x7b4/0x3600 > [5.103515] ? check_prev_add+0xa40/0xa40 > [5.103515] ? lockdep_enabled+0xd/0x60 > [5.103515] ? lock_is_held_type+0x1a/0x100 > [5.103515] ? __cleanup_sighand+0xc0/0xc0 > [5.103515] ? lockdep_unlock+0x39/0x160 > [5.103515] kernel_clone+0x165/0xd20 > [5.103515] ? copy_init_mm+0x20/0x20 > [5.103515] ? pvclock_clocksource_read+0xd9/0x1a0 > [5.103515] ? sched_clock_local+0x99/0xc0 > [5.103515] ? kthread_insert_work_sanity_check+0xc0/0xc0 > [5.103515] kernel_thread+0xba/0x100 > [5.103515] ? __ia32_sys_clone3+0x40/0x40 > [5.103515] ? kthread_insert_work_sanity_check+0xc0/0xc0 > [5.103515] ? do_raw_spin_unlock+0xa9/0x160 > [5.103515] kthreadd+0x68f/0x7a0 > [5.103515] ? kthread_create_on_cpu+0x160/0x160 > [5.103515] ? lockdep_hardirqs_on+0x77/0x100 > [5.103515] ? _raw_spin_unlock_irq+0x24/0x60 > [5.103515] ? kthread_create_on_cpu+0x160/0x160 > [5.103515] ret_from_fork+0x22/0x30 > > Thank you very much! > > Dongli Zhang > > > On 2/6/21 7:03 PM, kernel test robot wrote: >> Greeting, >> >> FYI, we noticed the following commit (built with gcc-9): >> >> commit: 79991caf5202c7989928be534727805f8f68bb8d ("vdpa_sim_net: Add support >> for user supported devices") >> https://urldefense.com/v3/__https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git__;!!GqivPVa7Brio!LfgrgVVtPAjwjqTZX8yANgsix4f3cJmAA_CcMeCVymh5XYcamWdR9dnbIQA-p61PJtI$ >> >> Dongli-Zhang/vhost-scsi-alloc-vhost_scsi-with-kvzalloc-to-avoid-delay/20210129-191605 >> >> >> in testcase: trinity >> version: trinity-static-x86_64-x86_64-f93256fb_2019-08-28 >> with following parameters: >> >> runtime: 300s >> >> test-description: Trinity is a linux system call fuzz tester. >> test-url: >> https://urldefense.com/v3/__http://codemonkey.org.uk/projects/trinity/__;!!GqivPVa7Brio!LfgrgVVtPAjwjqTZX8yANgsix4f3cJmAA_CcMeCVymh5XYcamWdR9dnbIQA-6Y4x88c$ >> >> >> >> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 8G >> >> caused below changes (please refer to attached dmesg/kmsg for entire >> log/backtrace): >> >> >> +-+++ >> | | >> 39502d042a | 79991caf52 | >> +-+++ >> | boot_successes | >> 0 | 0 | >> | boot_failures | >> 62 | 57 | >> |
Re: [PATCH v2 1/1] vhost scsi: alloc vhost_scsi with kvzalloc() to avoid delay
Can anyone help to review this patch and give a review-by for it please? Thanks, Joe On 1/24/21 7:12 PM, Jason Wang wrote: > > On 2021/1/23 下午4:08, Dongli Zhang wrote: >> The size of 'struct vhost_scsi' is order-10 (~2.3MB). It may take long time >> delay by kzalloc() to compact memory pages by retrying multiple times when >> there is a lack of high-order pages. As a result, there is latency to >> create a VM (with vhost-scsi) or to hotadd vhost-scsi-based storage. >> >> The prior commit 595cb754983d ("vhost/scsi: use vmalloc for order-10 >> allocation") prefers to fallback only when really needed, while this patch >> allocates with kvzalloc() with __GFP_NORETRY implicitly set to avoid >> retrying memory pages compact for multiple times. >> >> The __GFP_NORETRY is implicitly set if the size to allocate is more than >> PAGE_SZIE and when __GFP_RETRY_MAYFAIL is not explicitly set. >> >> Cc: Aruna Ramakrishna >> Cc: Joe Jin >> Signed-off-by: Dongli Zhang >> --- >> Changed since v1: >> - To combine kzalloc() and vzalloc() as kvzalloc() >> (suggested by Jason Wang) >> >> drivers/vhost/scsi.c | 9 +++-- >> 1 file changed, 3 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c >> index 4ce9f00ae10e..5de21ad4bd05 100644 >> --- a/drivers/vhost/scsi.c >> +++ b/drivers/vhost/scsi.c >> @@ -1814,12 +1814,9 @@ static int vhost_scsi_open(struct inode *inode, >> struct file *f) >> struct vhost_virtqueue **vqs; >> int r = -ENOMEM, i; >> - vs = kzalloc(sizeof(*vs), GFP_KERNEL | __GFP_NOWARN | >> __GFP_RETRY_MAYFAIL); >> - if (!vs) { >> - vs = vzalloc(sizeof(*vs)); >> - if (!vs) >> - goto err_vs; >> - } >> + vs = kvzalloc(sizeof(*vs), GFP_KERNEL); >> + if (!vs) >> + goto err_vs; >> vqs = kmalloc_array(VHOST_SCSI_MAX_VQ, sizeof(*vqs), GFP_KERNEL); >> if (!vqs) > > > Acked-by: Jason Wang > > >
Re: [PATCH] xen/swiotlb: correct the check for xen_destroy_contiguous_region
On 4/28/20 10:25 AM, Konrad Rzeszutek Wilk wrote: > On Tue, Apr 28, 2020 at 12:19:41PM +0200, Jürgen Groß wrote: >> On 28.04.20 10:25, Peng Fan wrote: > > Adding Joe Jin. > > Joe, didn't you have some ideas on how this could be implemented? > >>>> Subject: Re: [PATCH] xen/swiotlb: correct the check for >>>> xen_destroy_contiguous_region >>>> >>>> On 28.04.20 09:33, peng@nxp.com wrote: >>>>> From: Peng Fan >>>>> >>>>> When booting xen on i.MX8QM, met: >>>>> " >>>>> [3.602128] Unable to handle kernel paging request at virtual address >>>> 00272d40 >>>>> [3.610804] Mem abort info: >>>>> [3.613905] ESR = 0x9604 >>>>> [3.617332] EC = 0x25: DABT (current EL), IL = 32 bits >>>>> [3.623211] SET = 0, FnV = 0 >>>>> [3.626628] EA = 0, S1PTW = 0 >>>>> [3.630128] Data abort info: >>>>> [3.633362] ISV = 0, ISS = 0x0004 >>>>> [3.637630] CM = 0, WnR = 0 >>>>> [3.640955] [00272d40] user address but active_mm is >>>> swapper >>>>> [3.647983] Internal error: Oops: 9604 [#1] PREEMPT SMP >>>>> [3.654137] Modules linked in: >>>>> [3.677285] Hardware name: Freescale i.MX8QM MEK (DT) >>>>> [3.677302] Workqueue: events deferred_probe_work_func >>>>> [3.684253] imx6q-pcie 5f00.pcie: PCI host bridge to bus :00 >>>>> [3.688297] pstate: 6005 (nZCv daif -PAN -UAO) >>>>> [3.688310] pc : xen_swiotlb_free_coherent+0x180/0x1c0 >>>>> [3.693993] pci_bus :00: root bus resource [bus 00-ff] >>>>> [3.701002] lr : xen_swiotlb_free_coherent+0x44/0x1c0 >>>>> " >>>>> >>>>> In xen_swiotlb_alloc_coherent, if !(dev_addr + size - 1 <= dma_mask) >>>>> or range_straddles_page_boundary(phys, size) are true, it will create >>>>> contiguous region. So when free, we need to free contiguous region use >>>>> upper check condition. >>>> >>>> No, this will break PV guests on x86. >>> >>> Could you share more details why alloc and free not matching for the check? >> >> xen_create_contiguous_region() is needed only in case: >> >> - the bus address is not within dma_mask, or >> - the memory region is not physically contiguous (can happen only for >> PV guests) >> >> In any case it should arrange for the memory to be suitable for the >> DMA operation, so to be contiguous and within dma_mask afterwards. So >> xen_destroy_contiguous_region() should only ever called for areas >> which match above criteria, as otherwise we can be sure >> xen_create_contiguous_region() was not used for making the area DMA-able >> in the beginning. I agreed with Juergen's explanation, That is my understanding. Peng, if panic caused by (dev_addr + size - 1 > dma_mask), you should check how you get the addr, if memory created by xen_create_contiguous_region(), memory must be with in [0 - dma_mask]. Thanks, Joe >> >> And this is very important in the PV case, as in those guests the page >> tables are containing the host-PFNs, not the guest-PFNS, and >> xen_create_contiguous_region() will fiddle with host- vs. guest-PFN >> arrangements, and xen_destroy_contiguous_region() is reverting this >> fiddling. Any call of xen_destroy_contiguous_region() for an area it >> was not intended to be called for might swap physical pages beneath >> random virtual addresses, which was the reason for this test to be >> added by me. >> >> >> Juergen >> >>> >>> Thanks, >>> Peng. >>> >>>> >>>> I think there is something wrong with your setup in combination with the >>>> ARM >>>> xen_create_contiguous_region() implementation. >>>> >>>> Stefano? >>>> >>>> >>>> Juergen >>>> >>>>> >>>>> Signed-off-by: Peng Fan >>>>> --- >>>>>drivers/xen/swiotlb-xen.c | 4 ++-- >>>>>1 file changed, 2 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c >>>>> index b6d27762c6f8..ab96e468584f 100644 >>>>> --- a/drivers/xen/swiotlb-xen.c >>>>> +++ b/drivers/xen/swiotlb-xen.c >>>>> @@ -346,8 +346,8 @@ xen_swiotlb_free_coherent(struct device *hwdev, >>>> size_t size, void *vaddr, >>>>> /* Convert the size to actually allocated. */ >>>>> size = 1UL << (order + XEN_PAGE_SHIFT); >>>>> >>>>> - if (!WARN_ON((dev_addr + size - 1 > dma_mask) || >>>>> - range_straddles_page_boundary(phys, size)) && >>>>> + if (((dev_addr + size - 1 > dma_mask) || >>>>> + range_straddles_page_boundary(phys, size)) && >>>>> TestClearPageXenRemapped(virt_to_page(vaddr))) >>>>> xen_destroy_contiguous_region(phys, order); >>>>> >>>>> >>> >>
Re: [PATCH] tracing: make exported ftrace_set_clr_event non-static
Patch looks good to me. Reviewed-by: Joe Jin Thanks, Joe On 7/4/19 10:21 AM, Denis Efremov wrote: > The function ftrace_set_clr_event is declared static and marked > EXPORT_SYMBOL_GPL(), which is at best an odd combination. Because the > function was decided to be a part of API, this commit removes the static > attribute and adds the declaration to the header. > > Fixes: f45d1225adb04 ("tracing: Kernel access to Ftrace instances") > Signed-off-by: Denis Efremov > --- > include/linux/trace_events.h | 1 + > kernel/trace/trace_events.c | 2 +- > 2 files changed, 2 insertions(+), 1 deletion(-) > > diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h > index 8a62731673f7..84bc84f00e8f 100644 > --- a/include/linux/trace_events.h > +++ b/include/linux/trace_events.h > @@ -539,6 +539,7 @@ extern int trace_event_get_offsets(struct > trace_event_call *call); > > #define is_signed_type(type) (((type)(-1)) < (type)1) > > +int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set); > int trace_set_clr_event(const char *system, const char *event, int set); > > /* > diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c > index 0ce3db67f556..b6b46184f6bf 100644 > --- a/kernel/trace/trace_events.c > +++ b/kernel/trace/trace_events.c > @@ -795,7 +795,7 @@ static int __ftrace_set_clr_event(struct trace_array *tr, > const char *match, > return ret; > } > > -static int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set) > +int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set) > { > char *event = NULL, *sub = NULL, *match; > int ret; >
Re: [PATCH v2 1/2] swiotlb: add debugfs to track swiotlb buffer usage
On 12/10/18 12:00 PM, Tim Chen wrote: >> @@ -528,6 +538,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, >> dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n", >> size); >> return SWIOTLB_MAP_ERROR; >> found: >> +#ifdef CONFIG_DEBUG_FS >> +io_tlb_used += nslots; >> +#endif > One nit I have about this patch is there are too many CONFIG_DEBUG_FS. > > For example here, instead of io_tlb_used, we can have a macro defined, > perhaps something like inc_iotlb_used(nslots). It can be placed in the > same section that swiotlb_create_debugfs is defined so there's a single > place where all the CONFIG_DEBUG_FS stuff is located. > > Then define inc_iotlb_used to be null when we don't have > CONFIG_DEBUG_FS. > Dongli had removed above ifdef/endif on his next patch, "[PATCH v2 2/2] swiotlb: checking whether swiotlb buffer is full with io_tlb_used" Thanks, Joe
Re: [PATCH v2 1/2] swiotlb: add debugfs to track swiotlb buffer usage
On 12/9/18 4:37 PM, Dongli Zhang wrote: > The device driver will not be able to do dma operations once swiotlb buffer > is full, either because the driver is using so many IO TLB blocks inflight, > or because there is memory leak issue in device driver. To export the > swiotlb buffer usage via debugfs would help the user estimate the size of > swiotlb buffer to pre-allocate or analyze device driver memory leak issue. > > Signed-off-by: Dongli Zhang Reviewed-by: Joe Jin > --- > Changed since v1: > * init debugfs with late_initcall (suggested by Robin Murphy) > * create debugfs entries with debugfs_create_ulong(suggested by Robin > Murphy) > > kernel/dma/swiotlb.c | 50 ++ > 1 file changed, 50 insertions(+) > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > index 045930e..3979c2c 100644 > --- a/kernel/dma/swiotlb.c > +++ b/kernel/dma/swiotlb.c > @@ -35,6 +35,9 @@ > #include > #include > #include > +#ifdef CONFIG_DEBUG_FS > +#include > +#endif > > #include > #include > @@ -73,6 +76,13 @@ static phys_addr_t io_tlb_start, io_tlb_end; > */ > static unsigned long io_tlb_nslabs; > > +#ifdef CONFIG_DEBUG_FS > +/* > + * The number of used IO TLB block > + */ > +static unsigned long io_tlb_used; > +#endif > + > /* > * This is a free list describing the number of free entries available from > * each index > @@ -528,6 +538,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, > dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n", > size); > return SWIOTLB_MAP_ERROR; > found: > +#ifdef CONFIG_DEBUG_FS > + io_tlb_used += nslots; > +#endif > spin_unlock_irqrestore(_tlb_lock, flags); > > /* > @@ -588,6 +601,10 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, > phys_addr_t tlb_addr, >*/ > for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != > IO_TLB_SEGSIZE -1) && io_tlb_list[i]; i--) > io_tlb_list[i] = ++count; > + > +#ifdef CONFIG_DEBUG_FS > + io_tlb_used -= nslots; > +#endif > } > spin_unlock_irqrestore(_tlb_lock, flags); > } > @@ -883,3 +900,36 @@ const struct dma_map_ops swiotlb_dma_ops = { > .dma_supported = dma_direct_supported, > }; > EXPORT_SYMBOL(swiotlb_dma_ops); > + > +#ifdef CONFIG_DEBUG_FS > + > +static int __init swiotlb_create_debugfs(void) > +{ > + static struct dentry *d_swiotlb_usage; > + struct dentry *ent; > + > + d_swiotlb_usage = debugfs_create_dir("swiotlb", NULL); > + > + if (!d_swiotlb_usage) > + return -ENOMEM; > + > + ent = debugfs_create_ulong("io_tlb_nslabs", 0400, > +d_swiotlb_usage, _tlb_nslabs); > + if (!ent) > + goto fail; > + > + ent = debugfs_create_ulong("io_tlb_used", 0400, > + d_swiotlb_usage, _tlb_used); > + if (!ent) > + goto fail; > + > + return 0; > + > +fail: > + debugfs_remove_recursive(d_swiotlb_usage); > + return -ENOMEM; > +} > + > +late_initcall(swiotlb_create_debugfs); > + > +#endif > -- Oracle <http://www.oracle.com> Joe Jin | Software Development Director ORACLE | Linux and Virtualization 500 Oracle Parkway Redwood City, CA US 94065
Re: [PATCH v2 2/2] swiotlb: checking whether swiotlb buffer is full with io_tlb_used
On 12/9/18 4:37 PM, Dongli Zhang wrote: > This patch uses io_tlb_used to help check whether swiotlb buffer is full. > io_tlb_used is no longer used for only debugfs. It is also used to help > optimize swiotlb_tbl_map_single(). > > Suggested-by: Joe Jin > Signed-off-by: Dongli Zhang Reviewed-by: Joe Jin > --- > kernel/dma/swiotlb.c | 10 -- > 1 file changed, 4 insertions(+), 6 deletions(-) > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > index 3979c2c..9300341 100644 > --- a/kernel/dma/swiotlb.c > +++ b/kernel/dma/swiotlb.c > @@ -76,12 +76,10 @@ static phys_addr_t io_tlb_start, io_tlb_end; > */ > static unsigned long io_tlb_nslabs; > > -#ifdef CONFIG_DEBUG_FS > /* > * The number of used IO TLB block > */ > static unsigned long io_tlb_used; > -#endif > > /* > * This is a free list describing the number of free entries available from > @@ -489,6 +487,10 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, >* request and allocate a buffer from that IO TLB pool. >*/ > spin_lock_irqsave(_tlb_lock, flags); > + > + if (unlikely(nslots > io_tlb_nslabs - io_tlb_used)) > + goto not_found; > + > index = ALIGN(io_tlb_index, stride); > if (index >= io_tlb_nslabs) > index = 0; > @@ -538,9 +540,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, > dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n", > size); > return SWIOTLB_MAP_ERROR; > found: > -#ifdef CONFIG_DEBUG_FS > io_tlb_used += nslots; > -#endif > spin_unlock_irqrestore(_tlb_lock, flags); > > /* > @@ -602,9 +602,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, > phys_addr_t tlb_addr, > for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != > IO_TLB_SEGSIZE -1) && io_tlb_list[i]; i--) > io_tlb_list[i] = ++count; > > -#ifdef CONFIG_DEBUG_FS > io_tlb_used -= nslots; > -#endif > } > spin_unlock_irqrestore(_tlb_lock, flags); > } > -- Oracle <http://www.oracle.com> Joe Jin | Software Development Director ORACLE | Linux and Virtualization 500 Oracle Parkway Redwood City, CA US 94065
Re: [PATCH RFC 1/1] swiotlb: add debugfs to track swiotlb buffer usage
On 12/6/18 9:49 PM, Dongli Zhang wrote: > > > On 12/07/2018 12:12 AM, Joe Jin wrote: >> Hi Dongli, >> >> Maybe move d_swiotlb_usage declare into swiotlb_create_debugfs(): > > I assume the call of swiotlb_tbl_map_single() might be frequent in some > situations, e.g., when 'swiotlb=force'. > > That's why I declare the d_swiotlb_usage out of any functions and use "if > (unlikely(!d_swiotlb_usage))". This is reasonable. Thanks, Joe > > I think "if (unlikely(!d_swiotlb_usage))" incur less performance overhead than > calling swiotlb_create_debugfs() every time to confirm if debugfs is created. > I > would declare d_swiotlb_usage statically inside swiotlb_create_debugfs() if > the > performance overhead is acceptable (it is trivial indeed). > > > That is the reason I tag the patch with RFC because I am not sure if the > on-demand creation of debugfs is fine with maintainers/reviewers. If swiotlb > pages are never allocated, we would not be able to see the debugfs entry. > > I would prefer to limit the modification within swiotlb and to not taint any > other files. > > The drawback is there is no place to create or delete the debugfs entry > because > swiotlb buffer could be initialized and uninitialized at very early stage. > >> >> void swiotlb_create_debugfs(void) >> { >> #ifdef CONFIG_DEBUG_FS >> static struct dentry *d_swiotlb_usage = NULL; >> >> if (d_swiotlb_usage) >> return; >> >> d_swiotlb_usage = debugfs_create_dir("swiotlb", NULL); >> >> if (!d_swiotlb_usage) >> return; >> >> debugfs_create_file("usage", 0600, d_swiotlb_usage, >> NULL, _usage_fops); >> #endif >> } >> >> And for io_tlb_used, possible add a check at the begin of >> swiotlb_tbl_map_single(), >> if there were not any free slots or not enough slots, return fail directly? > > This would optimize the slots allocation path. I will follow this in next > version after I got more suggestions and confirmations from maintainers. > > > Thank you very much! > > Dongli Zhang > >> >> Thanks, >> Joe >> On 12/5/18 7:59 PM, Dongli Zhang wrote: >>> The device driver will not be able to do dma operations once swiotlb buffer >>> is full, either because the driver is using so many IO TLB blocks inflight, >>> or because there is memory leak issue in device driver. To export the >>> swiotlb buffer usage via debugfs would help the user estimate the size of >>> swiotlb buffer to pre-allocate or analyze device driver memory leak issue. >>> >>> As the swiotlb can be initialized at very early stage when debugfs cannot >>> register successfully, this patch creates the debugfs entry on demand. >>> >>> Signed-off-by: Dongli Zhang >>> --- >>> kernel/dma/swiotlb.c | 57 >>> >>> 1 file changed, 57 insertions(+) >>> >>> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c >>> index 045930e..d3c8aa4 100644 >>> --- a/kernel/dma/swiotlb.c >>> +++ b/kernel/dma/swiotlb.c >>> @@ -35,6 +35,9 @@ >>> #include >>> #include >>> #include >>> +#ifdef CONFIG_DEBUG_FS >>> +#include >>> +#endif >>> >>> #include >>> #include >>> @@ -73,6 +76,13 @@ static phys_addr_t io_tlb_start, io_tlb_end; >>> */ >>> static unsigned long io_tlb_nslabs; >>> >>> +#ifdef CONFIG_DEBUG_FS >>> +/* >>> + * The number of used IO TLB block >>> + */ >>> +static unsigned long io_tlb_used; >>> +#endif >>> + >>> /* >>> * This is a free list describing the number of free entries available from >>> * each index >>> @@ -100,6 +110,41 @@ static DEFINE_SPINLOCK(io_tlb_lock); >>> >>> static int late_alloc; >>> >>> +#ifdef CONFIG_DEBUG_FS >>> + >>> +static struct dentry *d_swiotlb_usage; >>> + >>> +static int swiotlb_usage_show(struct seq_file *m, void *v) >>> +{ >>> + seq_printf(m, "%lu\n%lu\n", io_tlb_used, io_tlb_nslabs); >>> + return 0; >>> +} >>> + >>> +static int swiotlb_usage_open(struct inode *inode, struct file *filp) >>> +{ >>> + return single_open(filp, swiotlb_usage_show, NULL); >>> +} >>> + >>> +static const struc
Re: [PATCH RFC 1/1] swiotlb: add debugfs to track swiotlb buffer usage
Hi Dongli, Maybe move d_swiotlb_usage declare into swiotlb_create_debugfs(): void swiotlb_create_debugfs(void) { #ifdef CONFIG_DEBUG_FS static struct dentry *d_swiotlb_usage = NULL; if (d_swiotlb_usage) return; d_swiotlb_usage = debugfs_create_dir("swiotlb", NULL); if (!d_swiotlb_usage) return; debugfs_create_file("usage", 0600, d_swiotlb_usage, NULL, _usage_fops); #endif } And for io_tlb_used, possible add a check at the begin of swiotlb_tbl_map_single(), if there were not any free slots or not enough slots, return fail directly? Thanks, Joe On 12/5/18 7:59 PM, Dongli Zhang wrote: > The device driver will not be able to do dma operations once swiotlb buffer > is full, either because the driver is using so many IO TLB blocks inflight, > or because there is memory leak issue in device driver. To export the > swiotlb buffer usage via debugfs would help the user estimate the size of > swiotlb buffer to pre-allocate or analyze device driver memory leak issue. > > As the swiotlb can be initialized at very early stage when debugfs cannot > register successfully, this patch creates the debugfs entry on demand. > > Signed-off-by: Dongli Zhang > --- > kernel/dma/swiotlb.c | 57 > > 1 file changed, 57 insertions(+) > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > index 045930e..d3c8aa4 100644 > --- a/kernel/dma/swiotlb.c > +++ b/kernel/dma/swiotlb.c > @@ -35,6 +35,9 @@ > #include > #include > #include > +#ifdef CONFIG_DEBUG_FS > +#include > +#endif > > #include > #include > @@ -73,6 +76,13 @@ static phys_addr_t io_tlb_start, io_tlb_end; > */ > static unsigned long io_tlb_nslabs; > > +#ifdef CONFIG_DEBUG_FS > +/* > + * The number of used IO TLB block > + */ > +static unsigned long io_tlb_used; > +#endif > + > /* > * This is a free list describing the number of free entries available from > * each index > @@ -100,6 +110,41 @@ static DEFINE_SPINLOCK(io_tlb_lock); > > static int late_alloc; > > +#ifdef CONFIG_DEBUG_FS > + > +static struct dentry *d_swiotlb_usage; > + > +static int swiotlb_usage_show(struct seq_file *m, void *v) > +{ > + seq_printf(m, "%lu\n%lu\n", io_tlb_used, io_tlb_nslabs); > + return 0; > +} > + > +static int swiotlb_usage_open(struct inode *inode, struct file *filp) > +{ > + return single_open(filp, swiotlb_usage_show, NULL); > +} > + > +static const struct file_operations swiotlb_usage_fops = { > + .open = swiotlb_usage_open, > + .read = seq_read, > + .llseek = seq_lseek, > + .release= single_release, > +}; > + > +void swiotlb_create_debugfs(void) > +{ > + d_swiotlb_usage = debugfs_create_dir("swiotlb", NULL); > + > + if (!d_swiotlb_usage) > + return; > + > + debugfs_create_file("usage", 0600, d_swiotlb_usage, > + NULL, _usage_fops); > +} > + > +#endif > + > static int __init > setup_io_tlb_npages(char *str) > { > @@ -449,6 +494,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, > pr_warn_once("%s is active and system is using DMA bounce > buffers\n", >sme_active() ? "SME" : "SEV"); > > +#ifdef CONFIG_DEBUG_FS > + if (unlikely(!d_swiotlb_usage)) > + swiotlb_create_debugfs(); > +#endif > + > mask = dma_get_seg_boundary(hwdev); > > tbl_dma_addr &= mask; > @@ -528,6 +578,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, > dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n", > size); > return SWIOTLB_MAP_ERROR; > found: > +#ifdef CONFIG_DEBUG_FS > + io_tlb_used += nslots; > +#endif > spin_unlock_irqrestore(_tlb_lock, flags); > > /* > @@ -588,6 +641,10 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, > phys_addr_t tlb_addr, >*/ > for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != > IO_TLB_SEGSIZE -1) && io_tlb_list[i]; i--) > io_tlb_list[i] = ++count; > + > +#ifdef CONFIG_DEBUG_FS > + io_tlb_used -= nslots; > +#endif > } > spin_unlock_irqrestore(_tlb_lock, flags); > } >
Re: [PATCH 4.4 010/268] xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
On 6/7/18 1:28 PM, Ben Hutchings wrote: > On Mon, 2018-05-28 at 11:59 +0200, Greg Kroah-Hartman wrote: >> 4.4-stable review patch. If anyone has any objections, please let me know. >> >> ---------- >> >> From: Joe Jin >> >> commit 4855c92dbb7b3b85c23e88ab7ca04f99b9677b41 upstream. >> >> When run raidconfig from Dom0 we found that the Xen DMA heap is reduced, >> but Dom Heap is increased by the same size. Tracing raidconfig we found >> that the related ioctl() in megaraid_sas will call dma_alloc_coherent() >> to apply memory. If the memory allocated by Dom0 is not in the DMA area, >> it will exchange memory with Xen to meet the requiment. Later drivers >> call dma_free_coherent() to free the memory, on xen_swiotlb_free_coherent() >> the check condition (dev_addr + size - 1 <= dma_mask) is always false, > > I think this was meant to say (dev_addr + size - 1 > dma_mask), i.e. Hi Ben, Yes you are right, sorry I made the mistake, thanks for catch it. Is there any way to fix description from git repo? Regards, Joe > the condition that is replaced by this commit. If that's always false, > the new condition (the logical inverse) must always be true. > > [...] >> --- a/drivers/xen/swiotlb-xen.c >> +++ b/drivers/xen/swiotlb-xen.c >> @@ -359,7 +359,7 @@ xen_swiotlb_free_coherent(struct device >> * physical address */ >> phys = xen_bus_to_phys(dev_addr); >> >> -if (((dev_addr + size - 1 > dma_mask)) || >> +if (((dev_addr + size - 1 <= dma_mask)) || >> range_straddles_page_boundary(phys, size)) >> xen_destroy_contiguous_region(phys, order); >> > > So now we will always call xen_destroy_contiguous_region(), whether or > not xen_create_contiguous_region() was called during allocation. Is > that really the intent? If so, the entire condition could be removed > to make this clear. > > Alternately, if the commit message is correct, the condition could be > simplified to range_straddles_page_boundary(...). > > But I'm not at all convinced that either of these is correct. It seems > like you need to either find a way of distinguishing between memory > allocated with or without the use of xen_create_contiguous_region(), or > to use it unconditionally. > > Ben. > -- Oracle <http://www.oracle.com> Joe Jin | IT Director ORACLE | Production Engineering and Operations 600 Oracle Parkway Redwood City, CA US 94065
Re: [PATCH 4.4 010/268] xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
On 6/7/18 1:28 PM, Ben Hutchings wrote: > On Mon, 2018-05-28 at 11:59 +0200, Greg Kroah-Hartman wrote: >> 4.4-stable review patch. If anyone has any objections, please let me know. >> >> ---------- >> >> From: Joe Jin >> >> commit 4855c92dbb7b3b85c23e88ab7ca04f99b9677b41 upstream. >> >> When run raidconfig from Dom0 we found that the Xen DMA heap is reduced, >> but Dom Heap is increased by the same size. Tracing raidconfig we found >> that the related ioctl() in megaraid_sas will call dma_alloc_coherent() >> to apply memory. If the memory allocated by Dom0 is not in the DMA area, >> it will exchange memory with Xen to meet the requiment. Later drivers >> call dma_free_coherent() to free the memory, on xen_swiotlb_free_coherent() >> the check condition (dev_addr + size - 1 <= dma_mask) is always false, > > I think this was meant to say (dev_addr + size - 1 > dma_mask), i.e. Hi Ben, Yes you are right, sorry I made the mistake, thanks for catch it. Is there any way to fix description from git repo? Regards, Joe > the condition that is replaced by this commit. If that's always false, > the new condition (the logical inverse) must always be true. > > [...] >> --- a/drivers/xen/swiotlb-xen.c >> +++ b/drivers/xen/swiotlb-xen.c >> @@ -359,7 +359,7 @@ xen_swiotlb_free_coherent(struct device >> * physical address */ >> phys = xen_bus_to_phys(dev_addr); >> >> -if (((dev_addr + size - 1 > dma_mask)) || >> +if (((dev_addr + size - 1 <= dma_mask)) || >> range_straddles_page_boundary(phys, size)) >> xen_destroy_contiguous_region(phys, order); >> > > So now we will always call xen_destroy_contiguous_region(), whether or > not xen_create_contiguous_region() was called during allocation. Is > that really the intent? If so, the entire condition could be removed > to make this clear. > > Alternately, if the commit message is correct, the condition could be > simplified to range_straddles_page_boundary(...). > > But I'm not at all convinced that either of these is correct. It seems > like you need to either find a way of distinguishing between memory > allocated with or without the use of xen_create_contiguous_region(), or > to use it unconditionally. > > Ben. > -- Oracle <http://www.oracle.com> Joe Jin | IT Director ORACLE | Production Engineering and Operations 600 Oracle Parkway Redwood City, CA US 94065
[PATCH] xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
When run raidconfig from Dom0 we found that the Xen DMA heap is reduced, but Dom Heap is increased by the same size. Tracing raidconfig we found that the related ioctl() in megaraid_sas will call dma_alloc_coherent() to apply memory. If the memory allocated by Dom0 is not in the DMA area, it will exchange memory with Xen to meet the requiment. Later drivers call dma_free_coherent() to free the memory, on xen_swiotlb_free_coherent() the check condition (dev_addr + size - 1 <= dma_mask) is always false, it prevents calling xen_destroy_contiguous_region() to return the memory to the Xen DMA heap. This issue introduced by commit 6810df88dcfc2 "xen-swiotlb: When doing coherent alloc/dealloc check before swizzling the MFNs.". Signed-off-by: Joe Jin <joe@oracle.com> Tested-by: John Sobecki <john.sobe...@oracle.com> Reviewed-by: Rzeszutek Wilk <konrad.w...@oracle.com> Cc: sta...@vger.kernel.org --- drivers/xen/swiotlb-xen.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index e1c60899fdbc..a6f9ba85dc4b 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -351,7 +351,7 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr, * physical address */ phys = xen_bus_to_phys(dev_addr); - if (((dev_addr + size - 1 > dma_mask)) || + if (((dev_addr + size - 1 <= dma_mask)) || range_straddles_page_boundary(phys, size)) xen_destroy_contiguous_region(phys, order); -- 2.14.3 (Apple Git-98)
[PATCH] xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
When run raidconfig from Dom0 we found that the Xen DMA heap is reduced, but Dom Heap is increased by the same size. Tracing raidconfig we found that the related ioctl() in megaraid_sas will call dma_alloc_coherent() to apply memory. If the memory allocated by Dom0 is not in the DMA area, it will exchange memory with Xen to meet the requiment. Later drivers call dma_free_coherent() to free the memory, on xen_swiotlb_free_coherent() the check condition (dev_addr + size - 1 <= dma_mask) is always false, it prevents calling xen_destroy_contiguous_region() to return the memory to the Xen DMA heap. This issue introduced by commit 6810df88dcfc2 "xen-swiotlb: When doing coherent alloc/dealloc check before swizzling the MFNs.". Signed-off-by: Joe Jin Tested-by: John Sobecki Reviewed-by: Rzeszutek Wilk Cc: sta...@vger.kernel.org --- drivers/xen/swiotlb-xen.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index e1c60899fdbc..a6f9ba85dc4b 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -351,7 +351,7 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr, * physical address */ phys = xen_bus_to_phys(dev_addr); - if (((dev_addr + size - 1 > dma_mask)) || + if (((dev_addr + size - 1 <= dma_mask)) || range_straddles_page_boundary(phys, size)) xen_destroy_contiguous_region(phys, order); -- 2.14.3 (Apple Git-98)
Re: [PATCH UPSTREAM] xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
On 5/17/18 12:10 PM, Greg KH wrote: > On Thu, May 17, 2018 at 11:45:57AM -0700, Joe Jin wrote: >> When run raidconfig from Dom0 we found that the Xen DMA heap is reduced, >> but Dom Heap is increased by the same size. Tracing raidconfig we found >> that the related ioctl() in megaraid_sas will call dma_alloc_coherent() >> to apply memory. If the memory allocated by Dom0 is not in the DMA area, >> it will exchange memory with Xen to meet the requiment. Later drivers >> call dma_free_coherent() to free the memory, on xen_swiotlb_free_coherent() >> the check condition (dev_addr + size - 1 <= dma_mask) is always false, >> it prevents calling xen_destroy_contiguous_region() to return the memory >> to the Xen DMA heap. >> >> This issue introduced by commit 6810df88dcfc2 "xen-swiotlb: When doing >> coherent alloc/dealloc check before swizzling the MFNs.". >> >> Signed-off-by: Joe Jin <joe@oracle.com> >> Tested-by: John Sobecki <john.sobe...@oracle.com> >> Reviewed-by: Rzeszutek Wilk <konrad.w...@oracle.com> >> Cc: sta...@vger.kernel.org >> --- >> drivers/xen/swiotlb-xen.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) > > What does "PATCH UPSTREAM" mean? Oops I forgot to remove UPSTREAM, the tag for internal review. Sorry for this, will resend it without the tag. Thanks, Joe > > confused, > > greg k-h >
Re: [PATCH UPSTREAM] xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
On 5/17/18 12:10 PM, Greg KH wrote: > On Thu, May 17, 2018 at 11:45:57AM -0700, Joe Jin wrote: >> When run raidconfig from Dom0 we found that the Xen DMA heap is reduced, >> but Dom Heap is increased by the same size. Tracing raidconfig we found >> that the related ioctl() in megaraid_sas will call dma_alloc_coherent() >> to apply memory. If the memory allocated by Dom0 is not in the DMA area, >> it will exchange memory with Xen to meet the requiment. Later drivers >> call dma_free_coherent() to free the memory, on xen_swiotlb_free_coherent() >> the check condition (dev_addr + size - 1 <= dma_mask) is always false, >> it prevents calling xen_destroy_contiguous_region() to return the memory >> to the Xen DMA heap. >> >> This issue introduced by commit 6810df88dcfc2 "xen-swiotlb: When doing >> coherent alloc/dealloc check before swizzling the MFNs.". >> >> Signed-off-by: Joe Jin >> Tested-by: John Sobecki >> Reviewed-by: Rzeszutek Wilk >> Cc: sta...@vger.kernel.org >> --- >> drivers/xen/swiotlb-xen.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) > > What does "PATCH UPSTREAM" mean? Oops I forgot to remove UPSTREAM, the tag for internal review. Sorry for this, will resend it without the tag. Thanks, Joe > > confused, > > greg k-h >
[PATCH UPSTREAM] xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
When run raidconfig from Dom0 we found that the Xen DMA heap is reduced, but Dom Heap is increased by the same size. Tracing raidconfig we found that the related ioctl() in megaraid_sas will call dma_alloc_coherent() to apply memory. If the memory allocated by Dom0 is not in the DMA area, it will exchange memory with Xen to meet the requiment. Later drivers call dma_free_coherent() to free the memory, on xen_swiotlb_free_coherent() the check condition (dev_addr + size - 1 <= dma_mask) is always false, it prevents calling xen_destroy_contiguous_region() to return the memory to the Xen DMA heap. This issue introduced by commit 6810df88dcfc2 "xen-swiotlb: When doing coherent alloc/dealloc check before swizzling the MFNs.". Signed-off-by: Joe Jin <joe@oracle.com> Tested-by: John Sobecki <john.sobe...@oracle.com> Reviewed-by: Rzeszutek Wilk <konrad.w...@oracle.com> Cc: sta...@vger.kernel.org --- drivers/xen/swiotlb-xen.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index e1c60899fdbc..a6f9ba85dc4b 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -351,7 +351,7 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr, * physical address */ phys = xen_bus_to_phys(dev_addr); - if (((dev_addr + size - 1 > dma_mask)) || + if (((dev_addr + size - 1 <= dma_mask)) || range_straddles_page_boundary(phys, size)) xen_destroy_contiguous_region(phys, order); -- 2.14.3 (Apple Git-98)
[PATCH UPSTREAM] xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
When run raidconfig from Dom0 we found that the Xen DMA heap is reduced, but Dom Heap is increased by the same size. Tracing raidconfig we found that the related ioctl() in megaraid_sas will call dma_alloc_coherent() to apply memory. If the memory allocated by Dom0 is not in the DMA area, it will exchange memory with Xen to meet the requiment. Later drivers call dma_free_coherent() to free the memory, on xen_swiotlb_free_coherent() the check condition (dev_addr + size - 1 <= dma_mask) is always false, it prevents calling xen_destroy_contiguous_region() to return the memory to the Xen DMA heap. This issue introduced by commit 6810df88dcfc2 "xen-swiotlb: When doing coherent alloc/dealloc check before swizzling the MFNs.". Signed-off-by: Joe Jin Tested-by: John Sobecki Reviewed-by: Rzeszutek Wilk Cc: sta...@vger.kernel.org --- drivers/xen/swiotlb-xen.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index e1c60899fdbc..a6f9ba85dc4b 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -351,7 +351,7 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr, * physical address */ phys = xen_bus_to_phys(dev_addr); - if (((dev_addr + size - 1 > dma_mask)) || + if (((dev_addr + size - 1 <= dma_mask)) || range_straddles_page_boundary(phys, size)) xen_destroy_contiguous_region(phys, order); -- 2.14.3 (Apple Git-98)
Re: [PATCH V2] [scsi] enclosure: remove duplicate device before add new
Hi James, Can you please help to review the patch and comment it? Thanks, Joe On 09/20/13 08:16, Joe Jin wrote: > When do disk pull/insert test we encountered below: > > WARNING: at fs/sysfs/dir.c:455 sysfs_add_one+0xbc/0xe0() > Hardware name: SUN FIRE X4370 M2 SERVER > sysfs: cannot create duplicate filename > '/devices/pci:00/:00:03.0/:0d:00.0/host6/port-6:1/expander-6:1/port-6:1:14/end_device-6:1:14/target6:0:27/6:0:27:0/enclosure_device:HDD10' > Modules linked in: mptctl mptbase autofs4 hidp bluetooth rfkill lockd sunrpc > bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad > ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio > libiscsi_tcp libiscsi scsi_transport_iscsi dm_round_robin dm_multipath video > sbs sbshc acpi_pad acpi_memhotplug acpi_ipmi parport_pc lp parport ipmi_si > ipmi_devintf ipmi_msghandler sg ses enclosure ixgbe e1000e hwmon igb > snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device > snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc > iTCO_wdt pcspkr i2c_i801 ioatdma ghes iTCO_vendor_support hed dca i2c_core > i7core_edac edac_core dm_snapshot dm_zero dm_mirror dm_region_hash dm_log > dm_mod usb_storage shpchp mpt2sas scsi_transport_sas raid_class ahci libahci > sd_mod crc_t10dif raid1 ext3 jbd mbcache > Pid: 23302, comm: kworker/u:2 Tainted: P2.6.39-400.124.1.el5uek #1 > Call Trace: > [] ? sysfs_add_one+0xbc/0xe0 > [] warn_slowpath_common+0x90/0xc0 > [] warn_slowpath_fmt+0x6e/0x70 > [] ? strlcat+0x54/0x70 > [] sysfs_add_one+0xbc/0xe0 > [] sysfs_do_create_link+0x148/0x1d0 > [] sysfs_create_link+0x13/0x20 > [] enclosure_add_links+0xe7/0x110 [enclosure] > [] ? kobject_release+0xd/0x10 > [] ? kref_put+0x37/0x70 > [] enclosure_add_device+0x93/0xa0 [enclosure] > [] ses_enclosure_find_by_addr+0x76/0xc0 [ses] > [] ? ses_get_fault+0x40/0x40 [ses] > [] enclosure_for_each_device+0x63/0x90 [enclosure] > [] ses_match_to_enclosure+0x11a/0x1d0 [ses] > [] ses_intf_add+0x2c8/0x5c0 [ses] > [] ? kobject_get+0x1a/0x30 > [] ? add_tail+0x36/0x50 > [] device_add+0x2d4/0x380 > [] scsi_sysfs_add_sdev+0xe6/0x2a0 > [] scsi_add_lun+0x41c/0x560 > [] scsi_probe_and_add_lun+0x1e0/0x3e0 > [] ? default_spin_lock_flags+0x9/0x10 > [] __scsi_scan_target+0xe7/0x120 > [] scsi_scan_target+0xcd/0xf0 > [] sas_rphy_add+0x11b/0x170 [scsi_transport_sas] > [] mpt2sas_transport_port_add+0x2cf/0x430 [mpt2sas] > [] _scsih_sas_device_add+0x87/0x110 [mpt2sas] > [] _scsih_add_device+0x248/0x340 [mpt2sas] > [] ? mpt2sas_transport_update_links+0xf1/0x190 [mpt2sas] > [] _scsih_sas_topology_change_event+0x3c6/0x490 [mpt2sas] > [] ? add_timer+0x18/0x20 > [] ? queue_delayed_work_on+0xc5/0x170 > [] _mpt2sas_fw_work+0x205/0x240 [mpt2sas] > [] _firmware_event_work_delayed+0x19/0x20 [mpt2sas] > [] process_one_work+0xf9/0x370 > [] ? _mpt2sas_fw_work+0x240/0x240 [mpt2sas] > [] worker_thread+0xca/0x240 > [] ? manage_workers+0x90/0x90 > [] kthread+0x97/0xa0 > [] kernel_thread_helper+0x4/0x10 > [] ? kthread_bind+0x80/0x80 > [] ? gs_change+0x13/0x13 > ---[ end trace 89a1351702ab360f ]--- > > This caused by duplicate device in enclosure list, we need to remove the > possible duplicate entry to avoid the conflict when we add new one. > > Cc: James Bottomley > Signed-off-by: Joe Jin > --- > drivers/misc/enclosure.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c > index 0e8df41..173974d 100644 > --- a/drivers/misc/enclosure.c > +++ b/drivers/misc/enclosure.c > @@ -325,6 +325,8 @@ int enclosure_add_device(struct enclosure_device *edev, > int component, > if (cdev->dev) > enclosure_remove_links(cdev); > > + enclosure_remove_device(edev, dev); > + > put_device(cdev->dev); > cdev->dev = get_device(dev); > return enclosure_add_links(cdev); > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2] [scsi] enclosure: remove duplicate device before add new
Hi James, Can you please help to review the patch and comment it? Thanks, Joe On 09/20/13 08:16, Joe Jin wrote: When do disk pull/insert test we encountered below: WARNING: at fs/sysfs/dir.c:455 sysfs_add_one+0xbc/0xe0() Hardware name: SUN FIRE X4370 M2 SERVER sysfs: cannot create duplicate filename '/devices/pci:00/:00:03.0/:0d:00.0/host6/port-6:1/expander-6:1/port-6:1:14/end_device-6:1:14/target6:0:27/6:0:27:0/enclosure_device:HDD10' Modules linked in: mptctl mptbase autofs4 hidp bluetooth rfkill lockd sunrpc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi dm_round_robin dm_multipath video sbs sbshc acpi_pad acpi_memhotplug acpi_ipmi parport_pc lp parport ipmi_si ipmi_devintf ipmi_msghandler sg ses enclosure ixgbe e1000e hwmon igb snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr i2c_i801 ioatdma ghes iTCO_vendor_support hed dca i2c_core i7core_edac edac_core dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage shpchp mpt2sas scsi_transport_sas raid_class ahci libahci sd_mod crc_t10dif raid1 ext3 jbd mbcache Pid: 23302, comm: kworker/u:2 Tainted: P2.6.39-400.124.1.el5uek #1 Call Trace: [811daf8c] ? sysfs_add_one+0xbc/0xe0 [8106f030] warn_slowpath_common+0x90/0xc0 [8106f15e] warn_slowpath_fmt+0x6e/0x70 [81258bd4] ? strlcat+0x54/0x70 [811daf8c] sysfs_add_one+0xbc/0xe0 [811dbec8] sysfs_do_create_link+0x148/0x1d0 [811dbf83] sysfs_create_link+0x13/0x20 [a00de307] enclosure_add_links+0xe7/0x110 [enclosure] [8125325d] ? kobject_release+0xd/0x10 [812549e7] ? kref_put+0x37/0x70 [a00de3c3] enclosure_add_device+0x93/0xa0 [enclosure] [a00c8666] ses_enclosure_find_by_addr+0x76/0xc0 [ses] [a00c85f0] ? ses_get_fault+0x40/0x40 [ses] [a00de433] enclosure_for_each_device+0x63/0x90 [enclosure] [a00c8a8a] ses_match_to_enclosure+0x11a/0x1d0 [ses] [a00c8e08] ses_intf_add+0x2c8/0x5c0 [ses] [8125327a] ? kobject_get+0x1a/0x30 [814e8b56] ? add_tail+0x36/0x50 [81345ae4] device_add+0x2d4/0x380 [8136b096] scsi_sysfs_add_sdev+0xe6/0x2a0 [813682cc] scsi_add_lun+0x41c/0x560 [81368a80] scsi_probe_and_add_lun+0x1e0/0x3e0 [81041009] ? default_spin_lock_flags+0x9/0x10 [813696e7] __scsi_scan_target+0xe7/0x120 [81369b8d] scsi_scan_target+0xcd/0xf0 [a003faab] sas_rphy_add+0x11b/0x170 [scsi_transport_sas] [a009a74f] mpt2sas_transport_port_add+0x2cf/0x430 [mpt2sas] [a008d437] _scsih_sas_device_add+0x87/0x110 [mpt2sas] [a0094eb8] _scsih_add_device+0x248/0x340 [mpt2sas] [a0098cb1] ? mpt2sas_transport_update_links+0xf1/0x190 [mpt2sas] [a00977b6] _scsih_sas_topology_change_event+0x3c6/0x490 [mpt2sas] [81080698] ? add_timer+0x18/0x20 [8108a405] ? queue_delayed_work_on+0xc5/0x170 [a0097a85] _mpt2sas_fw_work+0x205/0x240 [mpt2sas] [a0097ad9] _firmware_event_work_delayed+0x19/0x20 [mpt2sas] [8108c0d9] process_one_work+0xf9/0x370 [a0097ac0] ? _mpt2sas_fw_work+0x240/0x240 [mpt2sas] [8108ca1a] worker_thread+0xca/0x240 [8108c950] ? manage_workers+0x90/0x90 [81090ff7] kthread+0x97/0xa0 [8150fdc4] kernel_thread_helper+0x4/0x10 [81090f60] ? kthread_bind+0x80/0x80 [8150fdc0] ? gs_change+0x13/0x13 ---[ end trace 89a1351702ab360f ]--- This caused by duplicate device in enclosure list, we need to remove the possible duplicate entry to avoid the conflict when we add new one. Cc: James Bottomley james.bottom...@hansenpartnership.com Signed-off-by: Joe Jin joe@oracle.com --- drivers/misc/enclosure.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c index 0e8df41..173974d 100644 --- a/drivers/misc/enclosure.c +++ b/drivers/misc/enclosure.c @@ -325,6 +325,8 @@ int enclosure_add_device(struct enclosure_device *edev, int component, if (cdev-dev) enclosure_remove_links(cdev); + enclosure_remove_device(edev, dev); + put_device(cdev-dev); cdev-dev = get_device(dev); return enclosure_add_links(cdev); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V2] [scsi] enclosure: remove duplicate device before add new
When do disk pull/insert test we encountered below: WARNING: at fs/sysfs/dir.c:455 sysfs_add_one+0xbc/0xe0() Hardware name: SUN FIRE X4370 M2 SERVER sysfs: cannot create duplicate filename '/devices/pci:00/:00:03.0/:0d:00.0/host6/port-6:1/expander-6:1/port-6:1:14/end_device-6:1:14/target6:0:27/6:0:27:0/enclosure_device:HDD10' Modules linked in: mptctl mptbase autofs4 hidp bluetooth rfkill lockd sunrpc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi dm_round_robin dm_multipath video sbs sbshc acpi_pad acpi_memhotplug acpi_ipmi parport_pc lp parport ipmi_si ipmi_devintf ipmi_msghandler sg ses enclosure ixgbe e1000e hwmon igb snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr i2c_i801 ioatdma ghes iTCO_vendor_support hed dca i2c_core i7core_edac edac_core dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage shpchp mpt2sas scsi_transport_sas raid_class ahci libahci sd_mod crc_t10dif raid1 ext3 jbd mbcache Pid: 23302, comm: kworker/u:2 Tainted: P2.6.39-400.124.1.el5uek #1 Call Trace: [] ? sysfs_add_one+0xbc/0xe0 [] warn_slowpath_common+0x90/0xc0 [] warn_slowpath_fmt+0x6e/0x70 [] ? strlcat+0x54/0x70 [] sysfs_add_one+0xbc/0xe0 [] sysfs_do_create_link+0x148/0x1d0 [] sysfs_create_link+0x13/0x20 [] enclosure_add_links+0xe7/0x110 [enclosure] [] ? kobject_release+0xd/0x10 [] ? kref_put+0x37/0x70 [] enclosure_add_device+0x93/0xa0 [enclosure] [] ses_enclosure_find_by_addr+0x76/0xc0 [ses] [] ? ses_get_fault+0x40/0x40 [ses] [] enclosure_for_each_device+0x63/0x90 [enclosure] [] ses_match_to_enclosure+0x11a/0x1d0 [ses] [] ses_intf_add+0x2c8/0x5c0 [ses] [] ? kobject_get+0x1a/0x30 [] ? add_tail+0x36/0x50 [] device_add+0x2d4/0x380 [] scsi_sysfs_add_sdev+0xe6/0x2a0 [] scsi_add_lun+0x41c/0x560 [] scsi_probe_and_add_lun+0x1e0/0x3e0 [] ? default_spin_lock_flags+0x9/0x10 [] __scsi_scan_target+0xe7/0x120 [] scsi_scan_target+0xcd/0xf0 [] sas_rphy_add+0x11b/0x170 [scsi_transport_sas] [] mpt2sas_transport_port_add+0x2cf/0x430 [mpt2sas] [] _scsih_sas_device_add+0x87/0x110 [mpt2sas] [] _scsih_add_device+0x248/0x340 [mpt2sas] [] ? mpt2sas_transport_update_links+0xf1/0x190 [mpt2sas] [] _scsih_sas_topology_change_event+0x3c6/0x490 [mpt2sas] [] ? add_timer+0x18/0x20 [] ? queue_delayed_work_on+0xc5/0x170 [] _mpt2sas_fw_work+0x205/0x240 [mpt2sas] [] _firmware_event_work_delayed+0x19/0x20 [mpt2sas] [] process_one_work+0xf9/0x370 [] ? _mpt2sas_fw_work+0x240/0x240 [mpt2sas] [] worker_thread+0xca/0x240 [] ? manage_workers+0x90/0x90 [] kthread+0x97/0xa0 [] kernel_thread_helper+0x4/0x10 [] ? kthread_bind+0x80/0x80 [] ? gs_change+0x13/0x13 ---[ end trace 89a1351702ab360f ]--- This caused by duplicate device in enclosure list, we need to remove the possible duplicate entry to avoid the conflict when we add new one. Cc: James Bottomley Signed-off-by: Joe Jin --- drivers/misc/enclosure.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c index 0e8df41..173974d 100644 --- a/drivers/misc/enclosure.c +++ b/drivers/misc/enclosure.c @@ -325,6 +325,8 @@ int enclosure_add_device(struct enclosure_device *edev, int component, if (cdev->dev) enclosure_remove_links(cdev); + enclosure_remove_device(edev, dev); + put_device(cdev->dev); cdev->dev = get_device(dev); return enclosure_add_links(cdev); -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V2] [scsi] enclosure: remove duplicate device before add new
When do disk pull/insert test we encountered below: WARNING: at fs/sysfs/dir.c:455 sysfs_add_one+0xbc/0xe0() Hardware name: SUN FIRE X4370 M2 SERVER sysfs: cannot create duplicate filename '/devices/pci:00/:00:03.0/:0d:00.0/host6/port-6:1/expander-6:1/port-6:1:14/end_device-6:1:14/target6:0:27/6:0:27:0/enclosure_device:HDD10' Modules linked in: mptctl mptbase autofs4 hidp bluetooth rfkill lockd sunrpc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi dm_round_robin dm_multipath video sbs sbshc acpi_pad acpi_memhotplug acpi_ipmi parport_pc lp parport ipmi_si ipmi_devintf ipmi_msghandler sg ses enclosure ixgbe e1000e hwmon igb snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr i2c_i801 ioatdma ghes iTCO_vendor_support hed dca i2c_core i7core_edac edac_core dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage shpchp mpt2sas scsi_transport_sas raid_class ahci libahci sd_mod crc_t10dif raid1 ext3 jbd mbcache Pid: 23302, comm: kworker/u:2 Tainted: P2.6.39-400.124.1.el5uek #1 Call Trace: [811daf8c] ? sysfs_add_one+0xbc/0xe0 [8106f030] warn_slowpath_common+0x90/0xc0 [8106f15e] warn_slowpath_fmt+0x6e/0x70 [81258bd4] ? strlcat+0x54/0x70 [811daf8c] sysfs_add_one+0xbc/0xe0 [811dbec8] sysfs_do_create_link+0x148/0x1d0 [811dbf83] sysfs_create_link+0x13/0x20 [a00de307] enclosure_add_links+0xe7/0x110 [enclosure] [8125325d] ? kobject_release+0xd/0x10 [812549e7] ? kref_put+0x37/0x70 [a00de3c3] enclosure_add_device+0x93/0xa0 [enclosure] [a00c8666] ses_enclosure_find_by_addr+0x76/0xc0 [ses] [a00c85f0] ? ses_get_fault+0x40/0x40 [ses] [a00de433] enclosure_for_each_device+0x63/0x90 [enclosure] [a00c8a8a] ses_match_to_enclosure+0x11a/0x1d0 [ses] [a00c8e08] ses_intf_add+0x2c8/0x5c0 [ses] [8125327a] ? kobject_get+0x1a/0x30 [814e8b56] ? add_tail+0x36/0x50 [81345ae4] device_add+0x2d4/0x380 [8136b096] scsi_sysfs_add_sdev+0xe6/0x2a0 [813682cc] scsi_add_lun+0x41c/0x560 [81368a80] scsi_probe_and_add_lun+0x1e0/0x3e0 [81041009] ? default_spin_lock_flags+0x9/0x10 [813696e7] __scsi_scan_target+0xe7/0x120 [81369b8d] scsi_scan_target+0xcd/0xf0 [a003faab] sas_rphy_add+0x11b/0x170 [scsi_transport_sas] [a009a74f] mpt2sas_transport_port_add+0x2cf/0x430 [mpt2sas] [a008d437] _scsih_sas_device_add+0x87/0x110 [mpt2sas] [a0094eb8] _scsih_add_device+0x248/0x340 [mpt2sas] [a0098cb1] ? mpt2sas_transport_update_links+0xf1/0x190 [mpt2sas] [a00977b6] _scsih_sas_topology_change_event+0x3c6/0x490 [mpt2sas] [81080698] ? add_timer+0x18/0x20 [8108a405] ? queue_delayed_work_on+0xc5/0x170 [a0097a85] _mpt2sas_fw_work+0x205/0x240 [mpt2sas] [a0097ad9] _firmware_event_work_delayed+0x19/0x20 [mpt2sas] [8108c0d9] process_one_work+0xf9/0x370 [a0097ac0] ? _mpt2sas_fw_work+0x240/0x240 [mpt2sas] [8108ca1a] worker_thread+0xca/0x240 [8108c950] ? manage_workers+0x90/0x90 [81090ff7] kthread+0x97/0xa0 [8150fdc4] kernel_thread_helper+0x4/0x10 [81090f60] ? kthread_bind+0x80/0x80 [8150fdc0] ? gs_change+0x13/0x13 ---[ end trace 89a1351702ab360f ]--- This caused by duplicate device in enclosure list, we need to remove the possible duplicate entry to avoid the conflict when we add new one. Cc: James Bottomley james.bottom...@hansenpartnership.com Signed-off-by: Joe Jin joe@oracle.com --- drivers/misc/enclosure.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c index 0e8df41..173974d 100644 --- a/drivers/misc/enclosure.c +++ b/drivers/misc/enclosure.c @@ -325,6 +325,8 @@ int enclosure_add_device(struct enclosure_device *edev, int component, if (cdev-dev) enclosure_remove_links(cdev); + enclosure_remove_device(edev, dev); + put_device(cdev-dev); cdev-dev = get_device(dev); return enclosure_add_links(cdev); -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [scsi] enclosure: remove all possible sysfs entries before add device
add_tail+0x36/0x50 [] device_add+0x2d4/0x380 [] scsi_sysfs_add_sdev+0xe6/0x2a0 [] scsi_add_lun+0x41c/0x560 [] scsi_probe_and_add_lun+0x1e0/0x3e0 [] ? default_spin_lock_flags+0x9/0x10 [] __scsi_scan_target+0xe7/0x120 [] scsi_scan_target+0xcd/0xf0 [] sas_rphy_add+0x11b/0x170 [scsi_transport_sas] [] mpt2sas_transport_port_add+0x2cf/0x430 [mpt2sas] [] _scsih_sas_device_add+0x87/0x110 [mpt2sas] [] _scsih_add_device+0x248/0x340 [mpt2sas] [] ? mpt2sas_transport_update_links+0xf1/0x190 [mpt2sas] [] _scsih_sas_topology_change_event+0x3c6/0x490 [mpt2sas] [] ? add_timer+0x18/0x20 [] ? queue_delayed_work_on+0xc5/0x170 [] _mpt2sas_fw_work+0x205/0x240 [mpt2sas] [] _firmware_event_work_delayed+0x19/0x20 [mpt2sas] [] process_one_work+0xf9/0x370 [] _firmware_event_work_delayed+0x19/0x20 [mpt2sas] [] process_one_work+0xf9/0x370 [] ? _mpt2sas_fw_work+0x240/0x240 [mpt2sas] [] worker_thread+0xca/0x240 [] ? manage_workers+0x90/0x90 [] kthread+0x97/0xa0 [] kernel_thread_helper+0x4/0x10 [] ? kthread_bind+0x80/0x80 [] ? gs_change+0x13/0x13 ---[ end trace 89a1351702ab360f ]--- [ses_enclosure_find_by_addr] call enclosure_add_device(edev=8817e4094000,i=4,efd->dev=8817e8304938),cdev=8817e4094cd0 Per above message you can see the last tried for enclosure_device:HDD10, the index of component is not same then conflicted. BTW, 6:0:27:0 and 7:0:27:0 are same disk. > >> > Cc: James Bottomley >> > Signed-off-by: Joe Jin >> > --- >> > drivers/misc/enclosure.c | 7 +++ >> > 1 file changed, 7 insertions(+) >> > >> > diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c >> > index 0e8df41..efc0e86 100644 >> > --- a/drivers/misc/enclosure.c >> > +++ b/drivers/misc/enclosure.c >> > @@ -325,6 +325,13 @@ int enclosure_add_device(struct enclosure_device >> > *edev, int component, >> >if (cdev->dev) >> >enclosure_remove_links(cdev); >> > >> > + if (dev) { > This test is pointless. Adding a NULL device is illegal. Yes this is right. Thanks, Joe > >> > + char name[ENCLOSURE_NAME_SIZE]; >> > + >> > + enclosure_link_name(cdev, name); >> > + sysfs_remove_link(>kobj, name); > If we're really going to force eject the device, then this should be > enclosure_remove_device(edev, dev); > > How do you prevent the case for remove re-add in the same slot? Surely > in that case, with your code, the link will get removed again when the > remove gets processed, so the slot will then look empty (even though > it's not). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [scsi] enclosure: remove all possible sysfs entries before add device
] ? kobject_release+0xd/0x10 [812549e7] ? kref_put+0x37/0x70 [a00de3c3] enclosure_add_device+0x93/0xa0 [enclosure] [a00c8666] ses_enclosure_find_by_addr+0x76/0xc0 [ses] [a00c85f0] ? ses_get_fault+0x40/0x40 [ses] [a00de433] enclosure_for_each_device+0x63/0x90 [enclosure] [a00c8a8a] ses_match_to_enclosure+0x11a/0x1d0 [ses] [a00c8e08] ses_intf_add+0x2c8/0x5c0 [ses] [8125327a] ? kobject_get+0x1a/0x30 [814e8b56] ? add_tail+0x36/0x50 [81345ae4] device_add+0x2d4/0x380 [8136b096] scsi_sysfs_add_sdev+0xe6/0x2a0 [813682cc] scsi_add_lun+0x41c/0x560 [81368a80] scsi_probe_and_add_lun+0x1e0/0x3e0 [81041009] ? default_spin_lock_flags+0x9/0x10 [813696e7] __scsi_scan_target+0xe7/0x120 [81369b8d] scsi_scan_target+0xcd/0xf0 [a003faab] sas_rphy_add+0x11b/0x170 [scsi_transport_sas] [a009a74f] mpt2sas_transport_port_add+0x2cf/0x430 [mpt2sas] [a008d437] _scsih_sas_device_add+0x87/0x110 [mpt2sas] [a0094eb8] _scsih_add_device+0x248/0x340 [mpt2sas] [a0098cb1] ? mpt2sas_transport_update_links+0xf1/0x190 [mpt2sas] [a00977b6] _scsih_sas_topology_change_event+0x3c6/0x490 [mpt2sas] [81080698] ? add_timer+0x18/0x20 [8108a405] ? queue_delayed_work_on+0xc5/0x170 [a0097a85] _mpt2sas_fw_work+0x205/0x240 [mpt2sas] [a0097ad9] _firmware_event_work_delayed+0x19/0x20 [mpt2sas] [8108c0d9] process_one_work+0xf9/0x370 [a0097ad9] _firmware_event_work_delayed+0x19/0x20 [mpt2sas] [8108c0d9] process_one_work+0xf9/0x370 [a0097ac0] ? _mpt2sas_fw_work+0x240/0x240 [mpt2sas] [8108ca1a] worker_thread+0xca/0x240 [8108c950] ? manage_workers+0x90/0x90 [81090ff7] kthread+0x97/0xa0 [8150fdc4] kernel_thread_helper+0x4/0x10 [81090f60] ? kthread_bind+0x80/0x80 [8150fdc0] ? gs_change+0x13/0x13 ---[ end trace 89a1351702ab360f ]--- [ses_enclosure_find_by_addr] call enclosure_add_device(edev=8817e4094000,i=4,efd-dev=8817e8304938),cdev=8817e4094cd0 Per above message you can see the last tried for enclosure_device:HDD10, the index of component is not same then conflicted. BTW, 6:0:27:0 and 7:0:27:0 are same disk. Cc: James Bottomley james.bottom...@hansenpartnership.com Signed-off-by: Joe Jin joe@oracle.com --- drivers/misc/enclosure.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c index 0e8df41..efc0e86 100644 --- a/drivers/misc/enclosure.c +++ b/drivers/misc/enclosure.c @@ -325,6 +325,13 @@ int enclosure_add_device(struct enclosure_device *edev, int component, if (cdev-dev) enclosure_remove_links(cdev); + if (dev) { This test is pointless. Adding a NULL device is illegal. Yes this is right. Thanks, Joe + char name[ENCLOSURE_NAME_SIZE]; + + enclosure_link_name(cdev, name); + sysfs_remove_link(dev-kobj, name); If we're really going to force eject the device, then this should be enclosure_remove_device(edev, dev); How do you prevent the case for remove re-add in the same slot? Surely in that case, with your code, the link will get removed again when the remove gets processed, so the slot will then look empty (even though it's not). -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [scsi] enclosure: remove all possible sysfs entries before add device
On 09/09/13 21:41, Christoph Hellwig wrote: >> Modules linked in: oracleacfs(P)(U) oracleadvm(P)(U) oracleoks(P)(U) > > Please reproduce without this weird crap loaded. > These modules is filesystem and will not impact enclosure. Thanks, Joe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [scsi] enclosure: remove all possible sysfs entries before add device
When do disk pull/insert test we encountered below: WARNING: at fs/sysfs/dir.c:455 sysfs_add_one+0xbc/0xe0() Hardware name: SUN FIRE X4370 M2 SERVER sysfs: cannot create duplicate filename '/devices/pci:00/:00:03.0/:0d:00.0/host6/port-6:1/expander-6:1/port-6:1:14/end_device-6:1:14/target6:0:27/6:0:27:0/enclosure_device:HDD10' Modules linked in: oracleacfs(P)(U) oracleadvm(P)(U) oracleoks(P)(U) mptctl mptbase autofs4 hidp bluetooth rfkill lockd sunrpc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi dm_round_robin dm_multipath video sbs sbshc acpi_pad acpi_memhotplug acpi_ipmi parport_pc lp parport ipmi_si ipmi_devintf ipmi_msghandler sg ses enclosure ixgbe e1000e hwmon igb snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr i2c_i801 ioatdma ghes iTCO_vendor_support hed dca i2c_core i7core_edac edac_core dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage shpchp mpt2sas scsi_transport_sas raid_class ahci libahci sd_mod crc_t10dif raid1 ext3 jbd mbcache Pid: 23302, comm: kworker/u:2 Tainted: P2.6.39-400.124.1.el5uek #1 Call Trace: [] ? sysfs_add_one+0xbc/0xe0 [] warn_slowpath_common+0x90/0xc0 [] warn_slowpath_fmt+0x6e/0x70 [] ? strlcat+0x54/0x70 [] sysfs_add_one+0xbc/0xe0 [] sysfs_do_create_link+0x148/0x1d0 [] sysfs_create_link+0x13/0x20 [] enclosure_add_links+0xe7/0x110 [enclosure] [] ? kobject_release+0xd/0x10 [] ? kref_put+0x37/0x70 [] enclosure_add_device+0x93/0xa0 [enclosure] [] ses_enclosure_find_by_addr+0x76/0xc0 [ses] [] ? ses_get_fault+0x40/0x40 [ses] [] enclosure_for_each_device+0x63/0x90 [enclosure] [] ses_match_to_enclosure+0x11a/0x1d0 [ses] [] ses_intf_add+0x2c8/0x5c0 [ses] [] ? kobject_get+0x1a/0x30 [] ? add_tail+0x36/0x50 [] device_add+0x2d4/0x380 [] scsi_sysfs_add_sdev+0xe6/0x2a0 [] scsi_add_lun+0x41c/0x560 [] scsi_probe_and_add_lun+0x1e0/0x3e0 [] ? default_spin_lock_flags+0x9/0x10 [] __scsi_scan_target+0xe7/0x120 [] scsi_scan_target+0xcd/0xf0 [] sas_rphy_add+0x11b/0x170 [scsi_transport_sas] [] mpt2sas_transport_port_add+0x2cf/0x430 [mpt2sas] [] _scsih_sas_device_add+0x87/0x110 [mpt2sas] [] _scsih_add_device+0x248/0x340 [mpt2sas] [] ? mpt2sas_transport_update_links+0xf1/0x190 [mpt2sas] [] _scsih_sas_topology_change_event+0x3c6/0x490 [mpt2sas] [] ? add_timer+0x18/0x20 [] ? queue_delayed_work_on+0xc5/0x170 [] _mpt2sas_fw_work+0x205/0x240 [mpt2sas] [] _firmware_event_work_delayed+0x19/0x20 [mpt2sas] [] process_one_work+0xf9/0x370 [] ? _mpt2sas_fw_work+0x240/0x240 [mpt2sas] [] worker_thread+0xca/0x240 [] ? manage_workers+0x90/0x90 [] kthread+0x97/0xa0 [] kernel_thread_helper+0x4/0x10 [] ? kthread_bind+0x80/0x80 [] ? gs_change+0x13/0x13 ---[ end trace 89a1351702ab360f ]--- During our test, multipath used, each LUN has 2 paths. when adding second path enclousure did not check if will adding device's symlink existed or no. Cc: James Bottomley Signed-off-by: Joe Jin --- drivers/misc/enclosure.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c index 0e8df41..efc0e86 100644 --- a/drivers/misc/enclosure.c +++ b/drivers/misc/enclosure.c @@ -325,6 +325,13 @@ int enclosure_add_device(struct enclosure_device *edev, int component, if (cdev->dev) enclosure_remove_links(cdev); + if (dev) { + char name[ENCLOSURE_NAME_SIZE]; + + enclosure_link_name(cdev, name); + sysfs_remove_link(>kobj, name); + } + put_device(cdev->dev); cdev->dev = get_device(dev); return enclosure_add_links(cdev); -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [scsi] enclosure: remove all possible sysfs entries before add device
On 09/09/13 21:41, Christoph Hellwig wrote: Modules linked in: oracleacfs(P)(U) oracleadvm(P)(U) oracleoks(P)(U) Please reproduce without this weird crap loaded. These modules is filesystem and will not impact enclosure. Thanks, Joe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [scsi] enclosure: remove all possible sysfs entries before add device
When do disk pull/insert test we encountered below: WARNING: at fs/sysfs/dir.c:455 sysfs_add_one+0xbc/0xe0() Hardware name: SUN FIRE X4370 M2 SERVER sysfs: cannot create duplicate filename '/devices/pci:00/:00:03.0/:0d:00.0/host6/port-6:1/expander-6:1/port-6:1:14/end_device-6:1:14/target6:0:27/6:0:27:0/enclosure_device:HDD10' Modules linked in: oracleacfs(P)(U) oracleadvm(P)(U) oracleoks(P)(U) mptctl mptbase autofs4 hidp bluetooth rfkill lockd sunrpc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi dm_round_robin dm_multipath video sbs sbshc acpi_pad acpi_memhotplug acpi_ipmi parport_pc lp parport ipmi_si ipmi_devintf ipmi_msghandler sg ses enclosure ixgbe e1000e hwmon igb snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr i2c_i801 ioatdma ghes iTCO_vendor_support hed dca i2c_core i7core_edac edac_core dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage shpchp mpt2sas scsi_transport_sas raid_class ahci libahci sd_mod crc_t10dif raid1 ext3 jbd mbcache Pid: 23302, comm: kworker/u:2 Tainted: P2.6.39-400.124.1.el5uek #1 Call Trace: [811daf8c] ? sysfs_add_one+0xbc/0xe0 [8106f030] warn_slowpath_common+0x90/0xc0 [8106f15e] warn_slowpath_fmt+0x6e/0x70 [81258bd4] ? strlcat+0x54/0x70 [811daf8c] sysfs_add_one+0xbc/0xe0 [811dbec8] sysfs_do_create_link+0x148/0x1d0 [811dbf83] sysfs_create_link+0x13/0x20 [a00de307] enclosure_add_links+0xe7/0x110 [enclosure] [8125325d] ? kobject_release+0xd/0x10 [812549e7] ? kref_put+0x37/0x70 [a00de3c3] enclosure_add_device+0x93/0xa0 [enclosure] [a00c8666] ses_enclosure_find_by_addr+0x76/0xc0 [ses] [a00c85f0] ? ses_get_fault+0x40/0x40 [ses] [a00de433] enclosure_for_each_device+0x63/0x90 [enclosure] [a00c8a8a] ses_match_to_enclosure+0x11a/0x1d0 [ses] [a00c8e08] ses_intf_add+0x2c8/0x5c0 [ses] [8125327a] ? kobject_get+0x1a/0x30 [814e8b56] ? add_tail+0x36/0x50 [81345ae4] device_add+0x2d4/0x380 [8136b096] scsi_sysfs_add_sdev+0xe6/0x2a0 [813682cc] scsi_add_lun+0x41c/0x560 [81368a80] scsi_probe_and_add_lun+0x1e0/0x3e0 [81041009] ? default_spin_lock_flags+0x9/0x10 [813696e7] __scsi_scan_target+0xe7/0x120 [81369b8d] scsi_scan_target+0xcd/0xf0 [a003faab] sas_rphy_add+0x11b/0x170 [scsi_transport_sas] [a009a74f] mpt2sas_transport_port_add+0x2cf/0x430 [mpt2sas] [a008d437] _scsih_sas_device_add+0x87/0x110 [mpt2sas] [a0094eb8] _scsih_add_device+0x248/0x340 [mpt2sas] [a0098cb1] ? mpt2sas_transport_update_links+0xf1/0x190 [mpt2sas] [a00977b6] _scsih_sas_topology_change_event+0x3c6/0x490 [mpt2sas] [81080698] ? add_timer+0x18/0x20 [8108a405] ? queue_delayed_work_on+0xc5/0x170 [a0097a85] _mpt2sas_fw_work+0x205/0x240 [mpt2sas] [a0097ad9] _firmware_event_work_delayed+0x19/0x20 [mpt2sas] [8108c0d9] process_one_work+0xf9/0x370 [a0097ac0] ? _mpt2sas_fw_work+0x240/0x240 [mpt2sas] [8108ca1a] worker_thread+0xca/0x240 [8108c950] ? manage_workers+0x90/0x90 [81090ff7] kthread+0x97/0xa0 [8150fdc4] kernel_thread_helper+0x4/0x10 [81090f60] ? kthread_bind+0x80/0x80 [8150fdc0] ? gs_change+0x13/0x13 ---[ end trace 89a1351702ab360f ]--- During our test, multipath used, each LUN has 2 paths. when adding second path enclousure did not check if will adding device's symlink existed or no. Cc: James Bottomley james.bottom...@hansenpartnership.com Signed-off-by: Joe Jin joe@oracle.com --- drivers/misc/enclosure.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c index 0e8df41..efc0e86 100644 --- a/drivers/misc/enclosure.c +++ b/drivers/misc/enclosure.c @@ -325,6 +325,13 @@ int enclosure_add_device(struct enclosure_device *edev, int component, if (cdev-dev) enclosure_remove_links(cdev); + if (dev) { + char name[ENCLOSURE_NAME_SIZE]; + + enclosure_link_name(cdev, name); + sysfs_remove_link(dev-kobj, name); + } + put_device(cdev-dev); cdev-dev = get_device(dev); return enclosure_add_links(cdev); -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] dm: allow error target to replace either bio-based and request-based targets
On 08/23/13 08:17, Mike Snitzer wrote: > Here is a patch that should work for your needs (I tested it to work > with 'dmsetup wipe_table' on both request-based and bio-based devices): This really what I looking for, thanks! > > From: Mike Snitzer > Date: Thu, 22 Aug 2013 18:21:38 -0400 > Subject: [PATCH] dm: allow error target to replace either bio-based and > request-based targets > > In may be useful to switch a request-based table to the "error" target. > Enhance the DM core to allow a single hybrid target to be capable of > handling either bios or requests. > > Add a request-based (.map_rq) member to the error target_type and train > dm_table_set_type() to prefer the md's established type (request-based > or bio-based). If the md doesn't have an established type default to > making the hybrid target bio-based. Signed-off-by: Joe Jin Thanks, Joe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] dm: allow error target to replace either bio-based and request-based targets
On 08/23/13 08:17, Mike Snitzer wrote: Here is a patch that should work for your needs (I tested it to work with 'dmsetup wipe_table' on both request-based and bio-based devices): This really what I looking for, thanks! From: Mike Snitzer snit...@redhat.com Date: Thu, 22 Aug 2013 18:21:38 -0400 Subject: [PATCH] dm: allow error target to replace either bio-based and request-based targets In may be useful to switch a request-based table to the error target. Enhance the DM core to allow a single hybrid target to be capable of handling either bios or requests. Add a request-based (.map_rq) member to the error target_type and train dm_table_set_type() to prefer the md's established type (request-based or bio-based). If the md doesn't have an established type default to making the hybrid target bio-based. Signed-off-by: Joe Jin joe@oracle.com Thanks, Joe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v2] dm ioctl: allow change device target type to error
Mikulas, thanks for you suggestions, I create new patch, can you please help review? Subject: dm: add map_rq define for error commit a5664da "dm ioctl: make bio or request based device type immutable" prevented "dmsetup wape_table" change the target type to "error" for there is not map_rq for error target type. Signed-off-by: Joe Jin --- drivers/md/dm-target.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/md/dm-target.c b/drivers/md/dm-target.c index 37ba5db..b690910 100644 --- a/drivers/md/dm-target.c +++ b/drivers/md/dm-target.c @@ -131,12 +131,19 @@ static int io_err_map(struct dm_target *tt, struct bio *bio) return -EIO; } +static int io_err_map_rq(struct dm_target *ti, struct request *clone, +union map_info *map_context) +{ + return -EIO; +} + static struct target_type error_target = { .name = "error", .version = {1, 1, 0}, .ctr = io_err_ctr, .dtr = io_err_dtr, .map = io_err_map, + .map_rq = io_err_map_rq, }; int __init dm_target_init(void) -- 1.8.3.1 On 08/21/13 22:48, Mikulas Patocka wrote: > > > On Wed, 21 Aug 2013, Joe Jin wrote: > >> commit a5664da "dm ioctl: make bio or request based device type immutable" >> prevented "dmsetup wape_table" change the target type to "error". > > That commit a5664da is there for a reason (it is not possible to change > bio-based device to request-based and vice versa) and I don't really see > how this patch is supposed to work. > > If there are bios that are in flight and that already passed through > blk_queue_bio, and you change the device from request-based to bio-based, > what are you going to do with them? - The patch doesn't do anything about > it. > > A better approach would be to create a new request-based target "error-rq" > and change the multipath target to "error-rq" target. That way, you don't > have to change device type from request based to bio based. > > Mikulas > >> -v2: setup md->queue even target type is "error". >> >> Signed-off-by: Joe Jin >> --- >> drivers/md/dm-ioctl.c | 4 >> drivers/md/dm-table.c | 12 >> drivers/md/dm.h | 1 + >> 3 files changed, 17 insertions(+) >> >> diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c >> index f1b7586..2a9b63d 100644 >> --- a/drivers/md/dm-ioctl.c >> +++ b/drivers/md/dm-ioctl.c >> @@ -1280,6 +1280,9 @@ static int table_load(struct dm_ioctl *param, size_t >> param_size) >> goto out; >> } >> >> +if (dm_is_error_target(t)) >> +goto error_target; >> + >> /* Protect md->type and md->queue against concurrent table loads. */ >> dm_lock_md_type(md); >> if (dm_get_md_type(md) == DM_TYPE_NONE) >> @@ -1293,6 +1296,7 @@ static int table_load(struct dm_ioctl *param, size_t >> param_size) >> goto out; >> } >> >> +error_target: >> /* setup md->queue to reflect md's type (may block) */ >> r = dm_setup_md_queue(md); >> if (r) { >> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c >> index f221812..27be46a 100644 >> --- a/drivers/md/dm-table.c >> +++ b/drivers/md/dm-table.c >> @@ -184,6 +184,18 @@ static int alloc_targets(struct dm_table *t, unsigned >> int num) >> return 0; >> } >> >> +bool dm_is_error_target(struct dm_table *t) >> +{ >> +unsigned i; >> + >> +for (i = 0; i < t->num_targets; i++) { >> +struct dm_target *tgt = t->targets + i; >> +if (strcmp(tgt->type->name, "error") == 0) >> +return true; >> +} >> +return false; >> +} >> + >> int dm_table_create(struct dm_table **result, fmode_t mode, >> unsigned num_targets, struct mapped_device *md) >> { >> diff --git a/drivers/md/dm.h b/drivers/md/dm.h >> index 45b97da..c7bceeb 100644 >> --- a/drivers/md/dm.h >> +++ b/drivers/md/dm.h >> @@ -69,6 +69,7 @@ unsigned dm_table_get_type(struct dm_table *t); >> struct target_type *dm_table_get_immutable_target_type(struct dm_table *t); >> bool dm_table_request_based(struct dm_table *t); >> bool dm_table_supports_discards(struct dm_table *t); >> +bool dm_is_error_target(struct dm_table *t); >> int dm_table_alloc_md_mempools(struct dm_table *t); >> void dm_table_free_md_mempools(struct dm_table *t); >> struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t); >> -- >> 1.8.3.1 >> >> -- >> dm-devel mailing list >> dm-de...@redhat.com >> https://www.redhat.com/mailman/listinfo/dm-devel >> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] dm ioctl: allow change device target type to error
On 08/21/13 23:06, Mike Snitzer wrote: > On Wed, Aug 21 2013 at 10:48am -0400, > Mikulas Patocka wrote: > >> >> >> On Wed, 21 Aug 2013, Joe Jin wrote: >> >>> commit a5664da "dm ioctl: make bio or request based device type immutable" >>> prevented "dmsetup wape_table" change the target type to "error". >> >> That commit a5664da is there for a reason (it is not possible to change >> bio-based device to request-based and vice versa) and I don't really see >> how this patch is supposed to work. >> >> If there are bios that are in flight and that already passed through >> blk_queue_bio, and you change the device from request-based to bio-based, >> what are you going to do with them? - The patch doesn't do anything about >> it. >> >> A better approach would be to create a new request-based target "error-rq" >> and change the multipath target to "error-rq" target. That way, you don't >> have to change device type from request based to bio based. > > My thoughts _exactly_. This patch is very confused. > > Joe, what are you looking to be able to do? Switch a dm-multipath > device to error? Or allowing switching a target that has > DM_TARGET_IMMUTABLE flag set to be switched to error target? > > The latter restriction was introduced with commit 36a0456fb ("dm table: > add immutable feature"). Hi Mike, So far dmsetup support wipe_table: https://bugzilla.redhat.com/show_bug.cgi?id=742607 As description in the bug Doc Text, "This could be useful, for example, if a long-running process keeps a device open after it has finished using it and you need to release the underlying devices before that process exits." After apply the commit, wipe_table no long works. Thanks, Joe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] dm ioctl: allow change device target type to error
commit a5664da "dm ioctl: make bio or request based device type immutable" prevented "dmsetup wape_table" change the target type to "error". -v2: setup md->queue even target type is "error". Signed-off-by: Joe Jin --- drivers/md/dm-ioctl.c | 4 drivers/md/dm-table.c | 12 drivers/md/dm.h | 1 + 3 files changed, 17 insertions(+) diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c index f1b7586..2a9b63d 100644 --- a/drivers/md/dm-ioctl.c +++ b/drivers/md/dm-ioctl.c @@ -1280,6 +1280,9 @@ static int table_load(struct dm_ioctl *param, size_t param_size) goto out; } + if (dm_is_error_target(t)) + goto error_target; + /* Protect md->type and md->queue against concurrent table loads. */ dm_lock_md_type(md); if (dm_get_md_type(md) == DM_TYPE_NONE) @@ -1293,6 +1296,7 @@ static int table_load(struct dm_ioctl *param, size_t param_size) goto out; } +error_target: /* setup md->queue to reflect md's type (may block) */ r = dm_setup_md_queue(md); if (r) { diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index f221812..27be46a 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -184,6 +184,18 @@ static int alloc_targets(struct dm_table *t, unsigned int num) return 0; } +bool dm_is_error_target(struct dm_table *t) +{ + unsigned i; + + for (i = 0; i < t->num_targets; i++) { + struct dm_target *tgt = t->targets + i; + if (strcmp(tgt->type->name, "error") == 0) + return true; + } + return false; +} + int dm_table_create(struct dm_table **result, fmode_t mode, unsigned num_targets, struct mapped_device *md) { diff --git a/drivers/md/dm.h b/drivers/md/dm.h index 45b97da..c7bceeb 100644 --- a/drivers/md/dm.h +++ b/drivers/md/dm.h @@ -69,6 +69,7 @@ unsigned dm_table_get_type(struct dm_table *t); struct target_type *dm_table_get_immutable_target_type(struct dm_table *t); bool dm_table_request_based(struct dm_table *t); bool dm_table_supports_discards(struct dm_table *t); +bool dm_is_error_target(struct dm_table *t); int dm_table_alloc_md_mempools(struct dm_table *t); void dm_table_free_md_mempools(struct dm_table *t); struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t); -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] dm ioctl: allow change device target type to error
commit a5664da "dm ioctl: make bio or request based device type immutable" prevented "dmsetup wape_table" change the target type to "error". Signed-off-by: Joe Jin --- drivers/md/dm-ioctl.c | 6 +- drivers/md/dm-table.c | 12 drivers/md/dm.h | 1 + 3 files changed, 18 insertions(+), 1 deletion(-) diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c index f1b7586..1ee9e41 100644 --- a/drivers/md/dm-ioctl.c +++ b/drivers/md/dm-ioctl.c @@ -1250,7 +1250,7 @@ static int populate_table(struct dm_table *table, static int table_load(struct dm_ioctl *param, size_t param_size) { - int r; + int r = 0; struct hash_cell *hc; struct dm_table *t, *old_map = NULL; struct mapped_device *md; @@ -1280,6 +1280,9 @@ static int table_load(struct dm_ioctl *param, size_t param_size) goto out; } + if (dm_is_error_target(t)) + goto error_target; + /* Protect md->type and md->queue against concurrent table loads. */ dm_lock_md_type(md); if (dm_get_md_type(md) == DM_TYPE_NONE) @@ -1303,6 +1306,7 @@ static int table_load(struct dm_ioctl *param, size_t param_size) } dm_unlock_md_type(md); +error_target: /* stage inactive table */ down_write(&_hash_lock); hc = dm_get_mdptr(md); diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index f221812..27be46a 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -184,6 +184,18 @@ static int alloc_targets(struct dm_table *t, unsigned int num) return 0; } +bool dm_is_error_target(struct dm_table *t) +{ + unsigned i; + + for (i = 0; i < t->num_targets; i++) { + struct dm_target *tgt = t->targets + i; + if (strcmp(tgt->type->name, "error") == 0) + return true; + } + return false; +} + int dm_table_create(struct dm_table **result, fmode_t mode, unsigned num_targets, struct mapped_device *md) { diff --git a/drivers/md/dm.h b/drivers/md/dm.h index 45b97da..c7bceeb 100644 --- a/drivers/md/dm.h +++ b/drivers/md/dm.h @@ -69,6 +69,7 @@ unsigned dm_table_get_type(struct dm_table *t); struct target_type *dm_table_get_immutable_target_type(struct dm_table *t); bool dm_table_request_based(struct dm_table *t); bool dm_table_supports_discards(struct dm_table *t); +bool dm_is_error_target(struct dm_table *t); int dm_table_alloc_md_mempools(struct dm_table *t); void dm_table_free_md_mempools(struct dm_table *t); struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t); -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] dm ioctl: allow change device target type to error
commit a5664da dm ioctl: make bio or request based device type immutable prevented dmsetup wape_table change the target type to error. Signed-off-by: Joe Jin joe@oracle.com --- drivers/md/dm-ioctl.c | 6 +- drivers/md/dm-table.c | 12 drivers/md/dm.h | 1 + 3 files changed, 18 insertions(+), 1 deletion(-) diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c index f1b7586..1ee9e41 100644 --- a/drivers/md/dm-ioctl.c +++ b/drivers/md/dm-ioctl.c @@ -1250,7 +1250,7 @@ static int populate_table(struct dm_table *table, static int table_load(struct dm_ioctl *param, size_t param_size) { - int r; + int r = 0; struct hash_cell *hc; struct dm_table *t, *old_map = NULL; struct mapped_device *md; @@ -1280,6 +1280,9 @@ static int table_load(struct dm_ioctl *param, size_t param_size) goto out; } + if (dm_is_error_target(t)) + goto error_target; + /* Protect md-type and md-queue against concurrent table loads. */ dm_lock_md_type(md); if (dm_get_md_type(md) == DM_TYPE_NONE) @@ -1303,6 +1306,7 @@ static int table_load(struct dm_ioctl *param, size_t param_size) } dm_unlock_md_type(md); +error_target: /* stage inactive table */ down_write(_hash_lock); hc = dm_get_mdptr(md); diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index f221812..27be46a 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -184,6 +184,18 @@ static int alloc_targets(struct dm_table *t, unsigned int num) return 0; } +bool dm_is_error_target(struct dm_table *t) +{ + unsigned i; + + for (i = 0; i t-num_targets; i++) { + struct dm_target *tgt = t-targets + i; + if (strcmp(tgt-type-name, error) == 0) + return true; + } + return false; +} + int dm_table_create(struct dm_table **result, fmode_t mode, unsigned num_targets, struct mapped_device *md) { diff --git a/drivers/md/dm.h b/drivers/md/dm.h index 45b97da..c7bceeb 100644 --- a/drivers/md/dm.h +++ b/drivers/md/dm.h @@ -69,6 +69,7 @@ unsigned dm_table_get_type(struct dm_table *t); struct target_type *dm_table_get_immutable_target_type(struct dm_table *t); bool dm_table_request_based(struct dm_table *t); bool dm_table_supports_discards(struct dm_table *t); +bool dm_is_error_target(struct dm_table *t); int dm_table_alloc_md_mempools(struct dm_table *t); void dm_table_free_md_mempools(struct dm_table *t); struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t); -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] dm ioctl: allow change device target type to error
commit a5664da dm ioctl: make bio or request based device type immutable prevented dmsetup wape_table change the target type to error. -v2: setup md-queue even target type is error. Signed-off-by: Joe Jin joe@oracle.com --- drivers/md/dm-ioctl.c | 4 drivers/md/dm-table.c | 12 drivers/md/dm.h | 1 + 3 files changed, 17 insertions(+) diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c index f1b7586..2a9b63d 100644 --- a/drivers/md/dm-ioctl.c +++ b/drivers/md/dm-ioctl.c @@ -1280,6 +1280,9 @@ static int table_load(struct dm_ioctl *param, size_t param_size) goto out; } + if (dm_is_error_target(t)) + goto error_target; + /* Protect md-type and md-queue against concurrent table loads. */ dm_lock_md_type(md); if (dm_get_md_type(md) == DM_TYPE_NONE) @@ -1293,6 +1296,7 @@ static int table_load(struct dm_ioctl *param, size_t param_size) goto out; } +error_target: /* setup md-queue to reflect md's type (may block) */ r = dm_setup_md_queue(md); if (r) { diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index f221812..27be46a 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -184,6 +184,18 @@ static int alloc_targets(struct dm_table *t, unsigned int num) return 0; } +bool dm_is_error_target(struct dm_table *t) +{ + unsigned i; + + for (i = 0; i t-num_targets; i++) { + struct dm_target *tgt = t-targets + i; + if (strcmp(tgt-type-name, error) == 0) + return true; + } + return false; +} + int dm_table_create(struct dm_table **result, fmode_t mode, unsigned num_targets, struct mapped_device *md) { diff --git a/drivers/md/dm.h b/drivers/md/dm.h index 45b97da..c7bceeb 100644 --- a/drivers/md/dm.h +++ b/drivers/md/dm.h @@ -69,6 +69,7 @@ unsigned dm_table_get_type(struct dm_table *t); struct target_type *dm_table_get_immutable_target_type(struct dm_table *t); bool dm_table_request_based(struct dm_table *t); bool dm_table_supports_discards(struct dm_table *t); +bool dm_is_error_target(struct dm_table *t); int dm_table_alloc_md_mempools(struct dm_table *t); void dm_table_free_md_mempools(struct dm_table *t); struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t); -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] dm ioctl: allow change device target type to error
On 08/21/13 23:06, Mike Snitzer wrote: On Wed, Aug 21 2013 at 10:48am -0400, Mikulas Patocka mpato...@redhat.com wrote: On Wed, 21 Aug 2013, Joe Jin wrote: commit a5664da dm ioctl: make bio or request based device type immutable prevented dmsetup wape_table change the target type to error. That commit a5664da is there for a reason (it is not possible to change bio-based device to request-based and vice versa) and I don't really see how this patch is supposed to work. If there are bios that are in flight and that already passed through blk_queue_bio, and you change the device from request-based to bio-based, what are you going to do with them? - The patch doesn't do anything about it. A better approach would be to create a new request-based target error-rq and change the multipath target to error-rq target. That way, you don't have to change device type from request based to bio based. My thoughts _exactly_. This patch is very confused. Joe, what are you looking to be able to do? Switch a dm-multipath device to error? Or allowing switching a target that has DM_TARGET_IMMUTABLE flag set to be switched to error target? The latter restriction was introduced with commit 36a0456fb (dm table: add immutable feature). Hi Mike, So far dmsetup support wipe_table: https://bugzilla.redhat.com/show_bug.cgi?id=742607 As description in the bug Doc Text, This could be useful, for example, if a long-running process keeps a device open after it has finished using it and you need to release the underlying devices before that process exits. After apply the commit, wipe_table no long works. Thanks, Joe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v2] dm ioctl: allow change device target type to error
Mikulas, thanks for you suggestions, I create new patch, can you please help review? Subject: dm: add map_rq define for error commit a5664da dm ioctl: make bio or request based device type immutable prevented dmsetup wape_table change the target type to error for there is not map_rq for error target type. Signed-off-by: Joe Jin joe@oracle.com --- drivers/md/dm-target.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/md/dm-target.c b/drivers/md/dm-target.c index 37ba5db..b690910 100644 --- a/drivers/md/dm-target.c +++ b/drivers/md/dm-target.c @@ -131,12 +131,19 @@ static int io_err_map(struct dm_target *tt, struct bio *bio) return -EIO; } +static int io_err_map_rq(struct dm_target *ti, struct request *clone, +union map_info *map_context) +{ + return -EIO; +} + static struct target_type error_target = { .name = error, .version = {1, 1, 0}, .ctr = io_err_ctr, .dtr = io_err_dtr, .map = io_err_map, + .map_rq = io_err_map_rq, }; int __init dm_target_init(void) -- 1.8.3.1 On 08/21/13 22:48, Mikulas Patocka wrote: On Wed, 21 Aug 2013, Joe Jin wrote: commit a5664da dm ioctl: make bio or request based device type immutable prevented dmsetup wape_table change the target type to error. That commit a5664da is there for a reason (it is not possible to change bio-based device to request-based and vice versa) and I don't really see how this patch is supposed to work. If there are bios that are in flight and that already passed through blk_queue_bio, and you change the device from request-based to bio-based, what are you going to do with them? - The patch doesn't do anything about it. A better approach would be to create a new request-based target error-rq and change the multipath target to error-rq target. That way, you don't have to change device type from request based to bio based. Mikulas -v2: setup md-queue even target type is error. Signed-off-by: Joe Jin joe@oracle.com --- drivers/md/dm-ioctl.c | 4 drivers/md/dm-table.c | 12 drivers/md/dm.h | 1 + 3 files changed, 17 insertions(+) diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c index f1b7586..2a9b63d 100644 --- a/drivers/md/dm-ioctl.c +++ b/drivers/md/dm-ioctl.c @@ -1280,6 +1280,9 @@ static int table_load(struct dm_ioctl *param, size_t param_size) goto out; } +if (dm_is_error_target(t)) +goto error_target; + /* Protect md-type and md-queue against concurrent table loads. */ dm_lock_md_type(md); if (dm_get_md_type(md) == DM_TYPE_NONE) @@ -1293,6 +1296,7 @@ static int table_load(struct dm_ioctl *param, size_t param_size) goto out; } +error_target: /* setup md-queue to reflect md's type (may block) */ r = dm_setup_md_queue(md); if (r) { diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index f221812..27be46a 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -184,6 +184,18 @@ static int alloc_targets(struct dm_table *t, unsigned int num) return 0; } +bool dm_is_error_target(struct dm_table *t) +{ +unsigned i; + +for (i = 0; i t-num_targets; i++) { +struct dm_target *tgt = t-targets + i; +if (strcmp(tgt-type-name, error) == 0) +return true; +} +return false; +} + int dm_table_create(struct dm_table **result, fmode_t mode, unsigned num_targets, struct mapped_device *md) { diff --git a/drivers/md/dm.h b/drivers/md/dm.h index 45b97da..c7bceeb 100644 --- a/drivers/md/dm.h +++ b/drivers/md/dm.h @@ -69,6 +69,7 @@ unsigned dm_table_get_type(struct dm_table *t); struct target_type *dm_table_get_immutable_target_type(struct dm_table *t); bool dm_table_request_based(struct dm_table *t); bool dm_table_supports_discards(struct dm_table *t); +bool dm_is_error_target(struct dm_table *t); int dm_table_alloc_md_mempools(struct dm_table *t); void dm_table_free_md_mempools(struct dm_table *t); struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t); -- 1.8.3.1 -- dm-devel mailing list dm-de...@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH] xen: initialize xen panic handler for PVHVM
On 08/16/13 20:43, Konrad Rzeszutek Wilk wrote: > Could you tell me what has been happening without this patch? Without this patch, Xen would not get pvhvm crash event, any config for on_crash in guest configure file will not be triggered. Thanks, Joe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH] xen: initialize xen panic handler for PVHVM
On 08/16/13 20:43, Konrad Rzeszutek Wilk wrote: Could you tell me what has been happening without this patch? Without this patch, Xen would not get pvhvm crash event, any config for on_crash in guest configure file will not be triggered. Thanks, Joe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel panic in skb_copy_bits
On 07/01/13 16:11, Ian Campbell wrote: > On Mon, 2013-07-01 at 11:18 +0800, Joe Jin wrote: >>> A workaround is to turn off O_DIRECT use by Xen as that ensures >>> the pages are copied. Xen 4.3 does this by default. >>> >>> I believe fixes for this are in 4.3 and 4.2.2 if using the >>> qemu upstream DM. Note these aren't real fixes, just a workaround >>> of a kernel bug. >> >> The guest is pvm, and disk model is xvbd, guest config file as below: > > Do you know which disk backend? The workaround Alex refers to went into > qdisk but I think blkback could still suffer from a variant of the > retransmit issue if you run it over iSCSI. > >>> To fix on a local build of xen you will need something like this: >>> https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9 >>> and something like this (NB: obviously insert your own git >>> repo and commit numbers) >>> https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca >>> >> >> I think this only for pvhvm/hvm? > > No, the underlying issue affects any PV device which is run over a > network protocol (NFS, iSCSI etc). In effect a delayed retransmit can > cross over the deayed ack and cause I/O to be completed while > retransmits are pending, such as is described in > http://www.spinics.net/lists/linux-nfs/msg34913.html (the original NFS > variant). The problem is that because Xen PV drivers often unmap the > page on I/O completion you get a crash (page fault) on the retransmit. > Can we do it by remember grant page refcount when mapping, and when unmap check if page refcount as same as mapping? This change will limited in xen-blkback. Another way is add new page flag like PG_send, when sendpage() be called, set the bit, when page be put, clear the bit. Then xen-blkback can wait on the pagequeue. Thanks, Joe > The issue also affects native but in that case the symptom is "just" a > corrupt packet on the wire. I tried to address this with my "skb > destructor" series but unfortunately I got bogged down on the details, > then I had to take time out to look into some other stuff and never > managed to get back into it. I'd be very grateful if there was someone > who could pick up that work (Alex gave some useful references in another > reply to this thread) > > Some PV disk backends (e.g. blktap2) have worked around this by using > grant copy instead of grant map, others (e.g. qdisk) have disabled > O_DIRECT so that the pages are copied into the dom0 page cache and > transmitted from there. > > We were discussing recently the possibility of mapping all ballooned out > pages to a single read-only scratch page instead of leaving them empty > in the page tables, this would cause the Xen case to revert to the > native case. I think Thanos was going to take a look into this. > > Ian. > -- Oracle <http://www.oracle.com> Joe Jin | Software Development Senior Manager | +8610.6106.5624 ORACLE | Linux and Virtualization No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel panic in skb_copy_bits
On 07/01/13 16:11, Ian Campbell wrote: On Mon, 2013-07-01 at 11:18 +0800, Joe Jin wrote: A workaround is to turn off O_DIRECT use by Xen as that ensures the pages are copied. Xen 4.3 does this by default. I believe fixes for this are in 4.3 and 4.2.2 if using the qemu upstream DM. Note these aren't real fixes, just a workaround of a kernel bug. The guest is pvm, and disk model is xvbd, guest config file as below: Do you know which disk backend? The workaround Alex refers to went into qdisk but I think blkback could still suffer from a variant of the retransmit issue if you run it over iSCSI. To fix on a local build of xen you will need something like this: https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9 and something like this (NB: obviously insert your own git repo and commit numbers) https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca I think this only for pvhvm/hvm? No, the underlying issue affects any PV device which is run over a network protocol (NFS, iSCSI etc). In effect a delayed retransmit can cross over the deayed ack and cause I/O to be completed while retransmits are pending, such as is described in http://www.spinics.net/lists/linux-nfs/msg34913.html (the original NFS variant). The problem is that because Xen PV drivers often unmap the page on I/O completion you get a crash (page fault) on the retransmit. Can we do it by remember grant page refcount when mapping, and when unmap check if page refcount as same as mapping? This change will limited in xen-blkback. Another way is add new page flag like PG_send, when sendpage() be called, set the bit, when page be put, clear the bit. Then xen-blkback can wait on the pagequeue. Thanks, Joe The issue also affects native but in that case the symptom is just a corrupt packet on the wire. I tried to address this with my skb destructor series but unfortunately I got bogged down on the details, then I had to take time out to look into some other stuff and never managed to get back into it. I'd be very grateful if there was someone who could pick up that work (Alex gave some useful references in another reply to this thread) Some PV disk backends (e.g. blktap2) have worked around this by using grant copy instead of grant map, others (e.g. qdisk) have disabled O_DIRECT so that the pages are copied into the dom0 page cache and transmitted from there. We were discussing recently the possibility of mapping all ballooned out pages to a single read-only scratch page instead of leaving them empty in the page tables, this would cause the Xen case to revert to the native case. I think Thanos was going to take a look into this. Ian. -- Oracle http://www.oracle.com Joe Jin | Software Development Senior Manager | +8610.6106.5624 ORACLE | Linux and Virtualization No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel panic in skb_copy_bits
On 07/01/13 16:11, Ian Campbell wrote: > On Mon, 2013-07-01 at 11:18 +0800, Joe Jin wrote: >>> A workaround is to turn off O_DIRECT use by Xen as that ensures >>> the pages are copied. Xen 4.3 does this by default. >>> >>> I believe fixes for this are in 4.3 and 4.2.2 if using the >>> qemu upstream DM. Note these aren't real fixes, just a workaround >>> of a kernel bug. >> >> The guest is pvm, and disk model is xvbd, guest config file as below: > > Do you know which disk backend? The workaround Alex refers to went into > qdisk but I think blkback could still suffer from a variant of the > retransmit issue if you run it over iSCSI. The backend is xen-blkback on iSCSI storage. > >>> To fix on a local build of xen you will need something like this: >>> https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9 >>> and something like this (NB: obviously insert your own git >>> repo and commit numbers) >>> https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca >>> >> >> I think this only for pvhvm/hvm? > > No, the underlying issue affects any PV device which is run over a > network protocol (NFS, iSCSI etc). In effect a delayed retransmit can > cross over the deayed ack and cause I/O to be completed while > retransmits are pending, such as is described in > http://www.spinics.net/lists/linux-nfs/msg34913.html (the original NFS > variant). The problem is that because Xen PV drivers often unmap the > page on I/O completion you get a crash (page fault) on the retransmit. > To prevent iSCSI call sendpage() reuse the page we disabled the sg from NIC, per test result the panic went. This also confirmed the page be unmpped by grant system, the symptom as same as nfs panic. > The issue also affects native but in that case the symptom is "just" a > corrupt packet on the wire. I tried to address this with my "skb > destructor" series but unfortunately I got bogged down on the details, > then I had to take time out to look into some other stuff and never > managed to get back into it. I'd be very grateful if there was someone > who could pick up that work (Alex gave some useful references in another > reply to this thread) > > Some PV disk backends (e.g. blktap2) have worked around this by using > grant copy instead of grant map, others (e.g. qdisk) have disabled > O_DIRECT so that the pages are copied into the dom0 page cache and > transmitted from there. The work around as same as we disable sg from NIC(disable it sendpage will create own page copy rather than reuse the page). Thanks, Joe > > We were discussing recently the possibility of mapping all ballooned out > pages to a single read-only scratch page instead of leaving them empty > in the page tables, this would cause the Xen case to revert to the > native case. I think Thanos was going to take a look into this. > > Ian. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel panic in skb_copy_bits
On 07/01/13 16:11, Ian Campbell wrote: On Mon, 2013-07-01 at 11:18 +0800, Joe Jin wrote: A workaround is to turn off O_DIRECT use by Xen as that ensures the pages are copied. Xen 4.3 does this by default. I believe fixes for this are in 4.3 and 4.2.2 if using the qemu upstream DM. Note these aren't real fixes, just a workaround of a kernel bug. The guest is pvm, and disk model is xvbd, guest config file as below: Do you know which disk backend? The workaround Alex refers to went into qdisk but I think blkback could still suffer from a variant of the retransmit issue if you run it over iSCSI. The backend is xen-blkback on iSCSI storage. To fix on a local build of xen you will need something like this: https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9 and something like this (NB: obviously insert your own git repo and commit numbers) https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca I think this only for pvhvm/hvm? No, the underlying issue affects any PV device which is run over a network protocol (NFS, iSCSI etc). In effect a delayed retransmit can cross over the deayed ack and cause I/O to be completed while retransmits are pending, such as is described in http://www.spinics.net/lists/linux-nfs/msg34913.html (the original NFS variant). The problem is that because Xen PV drivers often unmap the page on I/O completion you get a crash (page fault) on the retransmit. To prevent iSCSI call sendpage() reuse the page we disabled the sg from NIC, per test result the panic went. This also confirmed the page be unmpped by grant system, the symptom as same as nfs panic. The issue also affects native but in that case the symptom is just a corrupt packet on the wire. I tried to address this with my skb destructor series but unfortunately I got bogged down on the details, then I had to take time out to look into some other stuff and never managed to get back into it. I'd be very grateful if there was someone who could pick up that work (Alex gave some useful references in another reply to this thread) Some PV disk backends (e.g. blktap2) have worked around this by using grant copy instead of grant map, others (e.g. qdisk) have disabled O_DIRECT so that the pages are copied into the dom0 page cache and transmitted from there. The work around as same as we disable sg from NIC(disable it sendpage will create own page copy rather than reuse the page). Thanks, Joe We were discussing recently the possibility of mapping all ballooned out pages to a single read-only scratch page instead of leaving them empty in the page tables, this would cause the Xen case to revert to the native case. I think Thanos was going to take a look into this. Ian. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel panic in skb_copy_bits
On 06/30/13 17:13, Alex Bligh wrote: > > > --On 28 June 2013 12:17:43 +0800 Joe Jin wrote: > >> Find a similar issue >> http://www.gossamer-threads.com/lists/xen/devel/265611 So copied to Xen >> developer as well. > > I thought this sounded familiar. I haven't got the start of this > thread, but what version of Xen are you running and what device > model? If before 4.3, there is a page lifetime bug in the kernel > (not the xen code) which can affect anything where the guest accesses > the host's block stack and that in turn accesses the networking > stack (it may in fact be wider than that). So, e.g. domU on > iCSSI will do it. It tends to get triggered by a TCP retransmit > or (on NFS) the RPC equivalent. Essentially block operation > is considered complete, returning through xen and freeing the > grant table entry, and yet something in the kernel (e.g. tcp > retransmit) can still access the data. The nature of the bug > is extensively discussed in that thread - you'll also find > a reference to a thread on linux-nfs which concludes it > isn't an nfs problem, and even some patches to fix it in the > kernel adding reference counting. Do you know if have a fix for above? so far we also suspected the grant page be unmapped earlier, we using 4.1 stable during our test. > > A workaround is to turn off O_DIRECT use by Xen as that ensures > the pages are copied. Xen 4.3 does this by default. > > I believe fixes for this are in 4.3 and 4.2.2 if using the > qemu upstream DM. Note these aren't real fixes, just a workaround > of a kernel bug. The guest is pvm, and disk model is xvbd, guest config file as below: vif = ['mac=00:21:f6:00:00:01,bridge=c0a80b00'] OVM_simple_name = 'Guest#1' disk = ['file:/OVS/Repositories/0004fb0391e9eae94d1e907c/VirtualDisks/0004fb12f78799dad800ef47.img,xvda,w', 'phy:/dev/mapper/360060e8010141870058b41570002,xvdb,w', 'phy:/dev/mapper/360060e8010141870058b41570003,xvdc,w'] bootargs = '' uuid = '0004fb00-0006--2b00-77a4766001ed' on_reboot = 'restart' cpu_weight = 27500 OVM_os_type = 'Oracle Linux 5' cpu_cap = 0 maxvcpus = 8 OVM_high_availability = False memory = 4096 OVM_description = '' on_poweroff = 'destroy' on_crash = 'restart' bootloader = '/usr/bin/pygrub' guest_os_type = 'linux' name = '0004fb062b0077a4766001ed' vfb = ['type=vnc,vncunused=1,vnclisten=127.0.0.1,keymap=en-us'] vcpus = 8 OVM_cpu_compat_group = '' OVM_domain_type = 'xen_pvm' > > To fix on a local build of xen you will need something like this: > https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9 > and something like this (NB: obviously insert your own git > repo and commit numbers) > https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca > I think this only for pvhvm/hvm? Thanks, Joe > Also note those fixes are (technically) unsafe for live migration > unless there is an ordering change made in qemu's block open > call. > > Of course this might be something completely different. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel panic in skb_copy_bits
On 06/30/13 17:13, Alex Bligh wrote: --On 28 June 2013 12:17:43 +0800 Joe Jin joe@oracle.com wrote: Find a similar issue http://www.gossamer-threads.com/lists/xen/devel/265611 So copied to Xen developer as well. I thought this sounded familiar. I haven't got the start of this thread, but what version of Xen are you running and what device model? If before 4.3, there is a page lifetime bug in the kernel (not the xen code) which can affect anything where the guest accesses the host's block stack and that in turn accesses the networking stack (it may in fact be wider than that). So, e.g. domU on iCSSI will do it. It tends to get triggered by a TCP retransmit or (on NFS) the RPC equivalent. Essentially block operation is considered complete, returning through xen and freeing the grant table entry, and yet something in the kernel (e.g. tcp retransmit) can still access the data. The nature of the bug is extensively discussed in that thread - you'll also find a reference to a thread on linux-nfs which concludes it isn't an nfs problem, and even some patches to fix it in the kernel adding reference counting. Do you know if have a fix for above? so far we also suspected the grant page be unmapped earlier, we using 4.1 stable during our test. A workaround is to turn off O_DIRECT use by Xen as that ensures the pages are copied. Xen 4.3 does this by default. I believe fixes for this are in 4.3 and 4.2.2 if using the qemu upstream DM. Note these aren't real fixes, just a workaround of a kernel bug. The guest is pvm, and disk model is xvbd, guest config file as below: vif = ['mac=00:21:f6:00:00:01,bridge=c0a80b00'] OVM_simple_name = 'Guest#1' disk = ['file:/OVS/Repositories/0004fb0391e9eae94d1e907c/VirtualDisks/0004fb12f78799dad800ef47.img,xvda,w', 'phy:/dev/mapper/360060e8010141870058b41570002,xvdb,w', 'phy:/dev/mapper/360060e8010141870058b41570003,xvdc,w'] bootargs = '' uuid = '0004fb00-0006--2b00-77a4766001ed' on_reboot = 'restart' cpu_weight = 27500 OVM_os_type = 'Oracle Linux 5' cpu_cap = 0 maxvcpus = 8 OVM_high_availability = False memory = 4096 OVM_description = '' on_poweroff = 'destroy' on_crash = 'restart' bootloader = '/usr/bin/pygrub' guest_os_type = 'linux' name = '0004fb062b0077a4766001ed' vfb = ['type=vnc,vncunused=1,vnclisten=127.0.0.1,keymap=en-us'] vcpus = 8 OVM_cpu_compat_group = '' OVM_domain_type = 'xen_pvm' To fix on a local build of xen you will need something like this: https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9 and something like this (NB: obviously insert your own git repo and commit numbers) https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca I think this only for pvhvm/hvm? Thanks, Joe Also note those fixes are (technically) unsafe for live migration unless there is an ordering change made in qemu's block open call. Of course this might be something completely different. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel panic in skb_copy_bits
On 06/29/13 15:20, Eric Dumazet wrote: > On Sat, 2013-06-29 at 07:36 +0800, Joe Jin wrote: >> Hi Eric, >> >> The patch not fix the issue and panic as same as early I posted: >>> BUG: unable to handle kernel paging request at 88006d9e8d48 >>> IP: [] memcpy+0xb/0x120 >>> PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 >>> Oops: [#1] SMP >>> CPU 7 >>> Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback >>> xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding >>> be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core >>> ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio >>> dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi >>> xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler >>> parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper >>> drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event >>> snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm >>> snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support >>> pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core >>> hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage >>> lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase >>> scsi_transport_sas sd_mod crc_t10dif ex! > t3! >> jbd mbcac >> he >>> >>> >>> Pid: 0, comm: swapper Tainted: GW 2.6.39-300.32.1.el5uek #1 Dell >>> Inc. PowerEdge 2950/0DP246 > > > By the way my patch was for current kernels, not for 2.6.39 > > For instance, I was not able to reproduce the crash with 3.3 > > RCU in neighbour code was added in 2.6.37, but it looks like this code > is a bit fragile because all the kfree_skb() are done while neighbour > locks are held. > > So if a skb destructor triggers a new call to neighbour code, I presume > some bad things can happen. LOCKDEP could eventually help to detect > this. > > You could try to replace these kfree_skb() calls to dev_kfree_skb_irq() > just in case. > > (Do not forget the __skb_queue_purge() ones) > > Try a LOCKDEP build as well. So far we suspected it caused by iscsi called sendpage(), and later page be unmapped but still trying copy skb. We'll try to disable sg to see if help or no. Thanks, Joe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel panic in skb_copy_bits
On 06/29/13 15:20, Eric Dumazet wrote: On Sat, 2013-06-29 at 07:36 +0800, Joe Jin wrote: Hi Eric, The patch not fix the issue and panic as same as early I posted: BUG: unable to handle kernel paging request at 88006d9e8d48 IP: [812605bb] memcpy+0xb/0x120 PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 Oops: [#1] SMP CPU 7 Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ex! t3! jbd mbcac he Pid: 0, comm: swapper Tainted: GW 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 By the way my patch was for current kernels, not for 2.6.39 For instance, I was not able to reproduce the crash with 3.3 RCU in neighbour code was added in 2.6.37, but it looks like this code is a bit fragile because all the kfree_skb() are done while neighbour locks are held. So if a skb destructor triggers a new call to neighbour code, I presume some bad things can happen. LOCKDEP could eventually help to detect this. You could try to replace these kfree_skb() calls to dev_kfree_skb_irq() just in case. (Do not forget the __skb_queue_purge() ones) Try a LOCKDEP build as well. So far we suspected it caused by iscsi called sendpage(), and later page be unmapped but still trying copy skb. We'll try to disable sg to see if help or no. Thanks, Joe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel panic in skb_copy_bits
Hi Eric, The patch not fix the issue and panic as same as early I posted: > BUG: unable to handle kernel paging request at 88006d9e8d48 > IP: [] memcpy+0xb/0x120 > PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 > Oops: [#1] SMP > CPU 7 > Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback > xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding > be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core > ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio > dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs > xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler > parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper > drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event > snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer > snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi > dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed > dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc > scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase > scsi_transport_sas sd_mod crc_t10dif ext3! jbd mbcac he > > > Pid: 0, comm: swapper Tainted: GW 2.6.39-300.32.1.el5uek #1 Dell > Inc. PowerEdge 2950/0DP246 > RIP: e030:[] [] memcpy+0xb/0x120 > RSP: e02b:8801003c3d58 EFLAGS: 00010246 > RAX: 880076b9e280 RBX: 8800714d2c00 RCX: 0057 > RDX: RSI: 88006d9e8d48 RDI: 880076b9e280 > RBP: 8801003c3dc0 R08: 000bf723 R09: > R10: R11: 000a R12: 0034 > R13: 0034 R14: 02b8 R15: 05a8 > FS: 7fc1e852a6e0() GS:8801003c() knlGS: > CS: e033 DS: 002b ES: 002b CR0: 8005003b > CR2: 88006d9e8d48 CR3: 6370b000 CR4: 2660 > DR0: DR1: DR2: > DR3: DR6: 0ff0 DR7: 0400 > Process swapper (pid: 0, threadinfo 880077ac, task 880077abe240) > Stack: > 8142db21 880076b9e280 8800637097f0 > 02ec 02b8 880077ac > 8800637097f0 880066c9a7c0 fdb4 024c > Call Trace: > > [] ? skb_copy_bits+0x1c1/0x2e0 > [] skb_copy+0xf3/0x120 > [] neigh_timer_handler+0x1ac/0x350 > [] ? account_idle_ticks+0xe/0x10 > [] ? neigh_alloc+0x180/0x180 > [] call_timer_fn+0x4a/0x110 > [] ? neigh_alloc+0x180/0x180 > [] run_timer_softirq+0x13a/0x220 > [] __do_softirq+0xb9/0x1d0 > [] ? handle_percpu_irq+0x48/0x70 > [] call_softirq+0x1c/0x30 > [] do_softirq+0x65/0xa0 > [] irq_exit+0xab/0xc0 > [] xen_evtchn_do_upcall+0x35/0x50 > [] xen_do_hypervisor_callback+0x1e/0x30 > > [] ? xen_hypercall_sched_op+0xa/0x20 > [] ? xen_hypercall_sched_op+0xa/0x20 > [] ? xen_safe_halt+0x10/0x20 > [] ? default_idle+0x5b/0x170 > [] ? cpu_idle+0xc6/0xf0 > [] ? xen_irq_enable_direct_reloc+0x4/0x4 > [] ? cpu_bringup_and_idle+0xe/0x10 > Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d > 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 48 a5 89 d1 f3 > a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c > RIP [] memcpy+0xb/0x120 > RSP > CR2: 88006d9e8d48 Thanks, Joe On 06/28/13 17:37, Eric Dumazet wrote: > OK please try the following patch > > > [PATCH] neighbour: fix a race in neigh_destroy() > > There is a race in neighbour code, because neigh_destroy() uses > skb_queue_purge(>arp_queue) without holding neighbour lock, > while other parts of the code assume neighbour rwlock is what > protects arp_queue > > Convert all skb_queue_purge() calls to the __skb_queue_purge() variant > > Use __skb_queue_head_init() instead of skb_queue_head_init() > to make clear we do not use arp_queue.lock > > And hold neigh->lock in neigh_destroy() to close the race. > > Reported-by: Joe Jin > Signed-off-by: Eric Dumazet > --- > net/core/neighbour.c | 12 +++- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c > index 2569ab2..b7de821 100644 > --- a/net/core/neighbour.c > +++ b/net/core/neighbour.c > @@ -231,7 +231,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, > struct net_device *dev) > we must kill timers etc. and move > it to safe state. >*/ >
Re: kernel panic in skb_copy_bits
Hi Eric, Thanks for your patch, I'll test it then get back to you. Regards, Joe On 06/28/13 17:37, Eric Dumazet wrote: > OK please try the following patch > > > [PATCH] neighbour: fix a race in neigh_destroy() > > There is a race in neighbour code, because neigh_destroy() uses > skb_queue_purge(>arp_queue) without holding neighbour lock, > while other parts of the code assume neighbour rwlock is what > protects arp_queue > > Convert all skb_queue_purge() calls to the __skb_queue_purge() variant > > Use __skb_queue_head_init() instead of skb_queue_head_init() > to make clear we do not use arp_queue.lock > > And hold neigh->lock in neigh_destroy() to close the race. > > Reported-by: Joe Jin > Signed-off-by: Eric Dumazet > --- > net/core/neighbour.c | 12 +++- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c > index 2569ab2..b7de821 100644 > --- a/net/core/neighbour.c > +++ b/net/core/neighbour.c > @@ -231,7 +231,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, > struct net_device *dev) > we must kill timers etc. and move > it to safe state. >*/ > - skb_queue_purge(>arp_queue); > + __skb_queue_purge(>arp_queue); > n->arp_queue_len_bytes = 0; > n->output = neigh_blackhole; > if (n->nud_state & NUD_VALID) > @@ -286,7 +286,7 @@ static struct neighbour *neigh_alloc(struct neigh_table > *tbl, struct net_device > if (!n) > goto out_entries; > > - skb_queue_head_init(>arp_queue); > + __skb_queue_head_init(>arp_queue); > rwlock_init(>lock); > seqlock_init(>ha_lock); > n->updated= n->used = now; > @@ -708,7 +708,9 @@ void neigh_destroy(struct neighbour *neigh) > if (neigh_del_timer(neigh)) > pr_warn("Impossible event\n"); > > - skb_queue_purge(>arp_queue); > + write_lock_bh(>lock); > + __skb_queue_purge(>arp_queue); > + write_unlock_bh(>lock); > neigh->arp_queue_len_bytes = 0; > > if (dev->netdev_ops->ndo_neigh_destroy) > @@ -858,7 +860,7 @@ static void neigh_invalidate(struct neighbour *neigh) > neigh->ops->error_report(neigh, skb); > write_lock(>lock); > } > - skb_queue_purge(>arp_queue); > + __skb_queue_purge(>arp_queue); > neigh->arp_queue_len_bytes = 0; > } > > @@ -1210,7 +1212,7 @@ int neigh_update(struct neighbour *neigh, const u8 > *lladdr, u8 new, > > write_lock_bh(>lock); > } > - skb_queue_purge(>arp_queue); > + __skb_queue_purge(>arp_queue); > neigh->arp_queue_len_bytes = 0; > } > out: > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel panic in skb_copy_bits
Hi Eric, Thanks for your patch, I'll test it then get back to you. Regards, Joe On 06/28/13 17:37, Eric Dumazet wrote: OK please try the following patch [PATCH] neighbour: fix a race in neigh_destroy() There is a race in neighbour code, because neigh_destroy() uses skb_queue_purge(neigh-arp_queue) without holding neighbour lock, while other parts of the code assume neighbour rwlock is what protects arp_queue Convert all skb_queue_purge() calls to the __skb_queue_purge() variant Use __skb_queue_head_init() instead of skb_queue_head_init() to make clear we do not use arp_queue.lock And hold neigh-lock in neigh_destroy() to close the race. Reported-by: Joe Jin joe@oracle.com Signed-off-by: Eric Dumazet eduma...@google.com --- net/core/neighbour.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 2569ab2..b7de821 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -231,7 +231,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev) we must kill timers etc. and move it to safe state. */ - skb_queue_purge(n-arp_queue); + __skb_queue_purge(n-arp_queue); n-arp_queue_len_bytes = 0; n-output = neigh_blackhole; if (n-nud_state NUD_VALID) @@ -286,7 +286,7 @@ static struct neighbour *neigh_alloc(struct neigh_table *tbl, struct net_device if (!n) goto out_entries; - skb_queue_head_init(n-arp_queue); + __skb_queue_head_init(n-arp_queue); rwlock_init(n-lock); seqlock_init(n-ha_lock); n-updated= n-used = now; @@ -708,7 +708,9 @@ void neigh_destroy(struct neighbour *neigh) if (neigh_del_timer(neigh)) pr_warn(Impossible event\n); - skb_queue_purge(neigh-arp_queue); + write_lock_bh(neigh-lock); + __skb_queue_purge(neigh-arp_queue); + write_unlock_bh(neigh-lock); neigh-arp_queue_len_bytes = 0; if (dev-netdev_ops-ndo_neigh_destroy) @@ -858,7 +860,7 @@ static void neigh_invalidate(struct neighbour *neigh) neigh-ops-error_report(neigh, skb); write_lock(neigh-lock); } - skb_queue_purge(neigh-arp_queue); + __skb_queue_purge(neigh-arp_queue); neigh-arp_queue_len_bytes = 0; } @@ -1210,7 +1212,7 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, write_lock_bh(neigh-lock); } - skb_queue_purge(neigh-arp_queue); + __skb_queue_purge(neigh-arp_queue); neigh-arp_queue_len_bytes = 0; } out: -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel panic in skb_copy_bits
Hi Eric, The patch not fix the issue and panic as same as early I posted: BUG: unable to handle kernel paging request at 88006d9e8d48 IP: [812605bb] memcpy+0xb/0x120 PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 Oops: [#1] SMP CPU 7 Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ext3! jbd mbcac he Pid: 0, comm: swapper Tainted: GW 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 RIP: e030:[812605bb] [812605bb] memcpy+0xb/0x120 RSP: e02b:8801003c3d58 EFLAGS: 00010246 RAX: 880076b9e280 RBX: 8800714d2c00 RCX: 0057 RDX: RSI: 88006d9e8d48 RDI: 880076b9e280 RBP: 8801003c3dc0 R08: 000bf723 R09: R10: R11: 000a R12: 0034 R13: 0034 R14: 02b8 R15: 05a8 FS: 7fc1e852a6e0() GS:8801003c() knlGS: CS: e033 DS: 002b ES: 002b CR0: 8005003b CR2: 88006d9e8d48 CR3: 6370b000 CR4: 2660 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process swapper (pid: 0, threadinfo 880077ac, task 880077abe240) Stack: 8142db21 880076b9e280 8800637097f0 02ec 02b8 880077ac 8800637097f0 880066c9a7c0 fdb4 024c Call Trace: IRQ [8142db21] ? skb_copy_bits+0x1c1/0x2e0 [8142f173] skb_copy+0xf3/0x120 [81447fbc] neigh_timer_handler+0x1ac/0x350 [810573fe] ? account_idle_ticks+0xe/0x10 [81447e10] ? neigh_alloc+0x180/0x180 [8107dbaa] call_timer_fn+0x4a/0x110 [81447e10] ? neigh_alloc+0x180/0x180 [8107f82a] run_timer_softirq+0x13a/0x220 [81075c39] __do_softirq+0xb9/0x1d0 [810d9678] ? handle_percpu_irq+0x48/0x70 [81511d3c] call_softirq+0x1c/0x30 [810172e5] do_softirq+0x65/0xa0 [8107656b] irq_exit+0xab/0xc0 [812f97d5] xen_evtchn_do_upcall+0x35/0x50 [81511d8e] xen_do_hypervisor_callback+0x1e/0x30 EOI [810013aa] ? xen_hypercall_sched_op+0xa/0x20 [810013aa] ? xen_hypercall_sched_op+0xa/0x20 [8100a0b0] ? xen_safe_halt+0x10/0x20 [8101dfeb] ? default_idle+0x5b/0x170 [81014ac6] ? cpu_idle+0xc6/0xf0 [8100a8c9] ? xen_irq_enable_direct_reloc+0x4/0x4 [814f7bbe] ? cpu_bringup_and_idle+0xe/0x10 Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c RIP [812605bb] memcpy+0xb/0x120 RSP 8801003c3d58 CR2: 88006d9e8d48 Thanks, Joe On 06/28/13 17:37, Eric Dumazet wrote: OK please try the following patch [PATCH] neighbour: fix a race in neigh_destroy() There is a race in neighbour code, because neigh_destroy() uses skb_queue_purge(neigh-arp_queue) without holding neighbour lock, while other parts of the code assume neighbour rwlock is what protects arp_queue Convert all skb_queue_purge() calls to the __skb_queue_purge() variant Use __skb_queue_head_init() instead of skb_queue_head_init() to make clear we do not use arp_queue.lock And hold neigh-lock in neigh_destroy() to close the race. Reported-by: Joe Jin joe@oracle.com Signed-off-by: Eric Dumazet eduma...@google.com --- net/core/neighbour.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 2569ab2..b7de821 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -231,7 +231,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev) we must kill timers etc. and move
Re: kernel panic in skb_copy_bits
Find a similar issue http://www.gossamer-threads.com/lists/xen/devel/265611 So copied to Xen developer as well. On 06/27/13 13:31, Eric Dumazet wrote: > On Thu, 2013-06-27 at 10:58 +0800, Joe Jin wrote: >> Hi, >> >> When we do fail over test with iscsi + multipath by reset the switches >> on OVM(2.6.39) we hit the panic: >> >> BUG: unable to handle kernel paging request at 88006d9e8d48 >> IP: [] memcpy+0xb/0x120 >> PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 >> Oops: [#1] SMP >> CPU 7 >> Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback >> xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding >> be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core >> ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio >> dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs >> xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler >> parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper >> drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event >> snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer >> snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi >> dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed >> dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc >> scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase >> scsi_transport_sas sd_mod crc_t10dif ext! 3! > j! >> bd mbcache >> >> >> Pid: 0, comm: swapper Tainted: GW 2.6.39-300.32.1.el5uek #1 Dell >> Inc. PowerEdge 2950/0DP246 >> RIP: e030:[] [] memcpy+0xb/0x120 >> RSP: e02b:8801003c3d58 EFLAGS: 00010246 >> RAX: 880076b9e280 RBX: 8800714d2c00 RCX: 0057 >> RDX: RSI: 88006d9e8d48 RDI: 880076b9e280 >> RBP: 8801003c3dc0 R08: 000bf723 R09: >> R10: R11: 000a R12: 0034 >> R13: 0034 R14: 02b8 R15: 05a8 >> FS: 7fc1e852a6e0() GS:8801003c() knlGS: >> CS: e033 DS: 002b ES: 002b CR0: 8005003b >> CR2: 88006d9e8d48 CR3: 6370b000 CR4: 2660 >> DR0: DR1: DR2: >> DR3: DR6: 0ff0 DR7: 0400 >> Process swapper (pid: 0, threadinfo 880077ac, task 880077abe240) >> Stack: >> 8142db21 880076b9e280 8800637097f0 >> 02ec 02b8 880077ac >> 8800637097f0 880066c9a7c0 fdb4 024c >> Call Trace: >> >> [] ? skb_copy_bits+0x1c1/0x2e0 >> [] skb_copy+0xf3/0x120 >> [] neigh_timer_handler+0x1ac/0x350 >> [] ? account_idle_ticks+0xe/0x10 >> [] ? neigh_alloc+0x180/0x180 >> [] call_timer_fn+0x4a/0x110 >> [] ? neigh_alloc+0x180/0x180 >> [] run_timer_softirq+0x13a/0x220 >> [] __do_softirq+0xb9/0x1d0 >> [] ? handle_percpu_irq+0x48/0x70 >> [] call_softirq+0x1c/0x30 >> [] do_softirq+0x65/0xa0 >> [] irq_exit+0xab/0xc0 >> [] xen_evtchn_do_upcall+0x35/0x50 >> [] xen_do_hypervisor_callback+0x1e/0x30 >> >> [] ? xen_hypercall_sched_op+0xa/0x20 >> [] ? xen_hypercall_sched_op+0xa/0x20 >> [] ? xen_safe_halt+0x10/0x20 >> [] ? default_idle+0x5b/0x170 >> [] ? cpu_idle+0xc6/0xf0 >> [] ? xen_irq_enable_direct_reloc+0x4/0x4 >> [] ? cpu_bringup_and_idle+0xe/0x10 >> Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 >> 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 48 a5 89 d1 >> f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c >> RIP [] memcpy+0xb/0x120 >> RSP >> CR2: 88006d9e8d48 >> >> Reviewed vmcore I found the skb->users is 1 at the moment, checked network >> neighbour >> history I found skb_get() be replaced by skb_copy by commit 7e36763b2c: >> >> commit 7e36763b2c204d59de4e88087f84a2c0c8421f25 >> >> Author: Frank Blaschka >> >> Date: Mon Mar 3 12:16:04 2008 -0800 >> >> [NET]: Fix race in generic address resolution. >> >> >> neigh_update sends skb from neigh->arp_queue while neigh_timer_handler >> >> has increased skbs refcount and calls so
Re: kernel panic in skb_copy_bits
Hi Eric, Thanks for you response, will test it and get back to you. Regards, Joe On 06/27/13 13:31, Eric Dumazet wrote: > On Thu, 2013-06-27 at 10:58 +0800, Joe Jin wrote: >> Hi, >> >> When we do fail over test with iscsi + multipath by reset the switches >> on OVM(2.6.39) we hit the panic: >> >> BUG: unable to handle kernel paging request at 88006d9e8d48 >> IP: [] memcpy+0xb/0x120 >> PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 >> Oops: [#1] SMP >> CPU 7 >> Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback >> xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding >> be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core >> ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio >> dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs >> xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler >> parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper >> drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event >> snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer >> snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi >> dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed >> dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc >> scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase >> scsi_transport_sas sd_mod crc_t10dif ext! 3! > j! >> bd mbcache >> >> >> Pid: 0, comm: swapper Tainted: GW 2.6.39-300.32.1.el5uek #1 Dell >> Inc. PowerEdge 2950/0DP246 >> RIP: e030:[] [] memcpy+0xb/0x120 >> RSP: e02b:8801003c3d58 EFLAGS: 00010246 >> RAX: 880076b9e280 RBX: 8800714d2c00 RCX: 0057 >> RDX: RSI: 88006d9e8d48 RDI: 880076b9e280 >> RBP: 8801003c3dc0 R08: 000bf723 R09: >> R10: R11: 000a R12: 0034 >> R13: 0034 R14: 02b8 R15: 05a8 >> FS: 7fc1e852a6e0() GS:8801003c() knlGS: >> CS: e033 DS: 002b ES: 002b CR0: 8005003b >> CR2: 88006d9e8d48 CR3: 6370b000 CR4: 2660 >> DR0: DR1: DR2: >> DR3: DR6: 0ff0 DR7: 0400 >> Process swapper (pid: 0, threadinfo 880077ac, task 880077abe240) >> Stack: >> 8142db21 880076b9e280 8800637097f0 >> 02ec 02b8 880077ac >> 8800637097f0 880066c9a7c0 fdb4 024c >> Call Trace: >> >> [] ? skb_copy_bits+0x1c1/0x2e0 >> [] skb_copy+0xf3/0x120 >> [] neigh_timer_handler+0x1ac/0x350 >> [] ? account_idle_ticks+0xe/0x10 >> [] ? neigh_alloc+0x180/0x180 >> [] call_timer_fn+0x4a/0x110 >> [] ? neigh_alloc+0x180/0x180 >> [] run_timer_softirq+0x13a/0x220 >> [] __do_softirq+0xb9/0x1d0 >> [] ? handle_percpu_irq+0x48/0x70 >> [] call_softirq+0x1c/0x30 >> [] do_softirq+0x65/0xa0 >> [] irq_exit+0xab/0xc0 >> [] xen_evtchn_do_upcall+0x35/0x50 >> [] xen_do_hypervisor_callback+0x1e/0x30 >> >> [] ? xen_hypercall_sched_op+0xa/0x20 >> [] ? xen_hypercall_sched_op+0xa/0x20 >> [] ? xen_safe_halt+0x10/0x20 >> [] ? default_idle+0x5b/0x170 >> [] ? cpu_idle+0xc6/0xf0 >> [] ? xen_irq_enable_direct_reloc+0x4/0x4 >> [] ? cpu_bringup_and_idle+0xe/0x10 >> Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 >> 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 48 a5 89 d1 >> f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c >> RIP [] memcpy+0xb/0x120 >> RSP >> CR2: 88006d9e8d48 >> >> Reviewed vmcore I found the skb->users is 1 at the moment, checked network >> neighbour >> history I found skb_get() be replaced by skb_copy by commit 7e36763b2c: >> >> commit 7e36763b2c204d59de4e88087f84a2c0c8421f25 >> >> Author: Frank Blaschka >> >> Date: Mon Mar 3 12:16:04 2008 -0800 >> >> [NET]: Fix race in generic address resolution. >> >> >> neigh_update sends skb from neigh->arp_queue while neigh_timer_handler >> >> has increased skbs refcount and calls solicit with the
Re: kernel panic in skb_copy_bits
Hi Eric, Thanks for you response, will test it and get back to you. Regards, Joe On 06/27/13 13:31, Eric Dumazet wrote: On Thu, 2013-06-27 at 10:58 +0800, Joe Jin wrote: Hi, When we do fail over test with iscsi + multipath by reset the switches on OVM(2.6.39) we hit the panic: BUG: unable to handle kernel paging request at 88006d9e8d48 IP: [812605bb] memcpy+0xb/0x120 PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 Oops: [#1] SMP CPU 7 Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ext! 3! j! bd mbcache Pid: 0, comm: swapper Tainted: GW 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 RIP: e030:[812605bb] [812605bb] memcpy+0xb/0x120 RSP: e02b:8801003c3d58 EFLAGS: 00010246 RAX: 880076b9e280 RBX: 8800714d2c00 RCX: 0057 RDX: RSI: 88006d9e8d48 RDI: 880076b9e280 RBP: 8801003c3dc0 R08: 000bf723 R09: R10: R11: 000a R12: 0034 R13: 0034 R14: 02b8 R15: 05a8 FS: 7fc1e852a6e0() GS:8801003c() knlGS: CS: e033 DS: 002b ES: 002b CR0: 8005003b CR2: 88006d9e8d48 CR3: 6370b000 CR4: 2660 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process swapper (pid: 0, threadinfo 880077ac, task 880077abe240) Stack: 8142db21 880076b9e280 8800637097f0 02ec 02b8 880077ac 8800637097f0 880066c9a7c0 fdb4 024c Call Trace: IRQ [8142db21] ? skb_copy_bits+0x1c1/0x2e0 [8142f173] skb_copy+0xf3/0x120 [81447fbc] neigh_timer_handler+0x1ac/0x350 [810573fe] ? account_idle_ticks+0xe/0x10 [81447e10] ? neigh_alloc+0x180/0x180 [8107dbaa] call_timer_fn+0x4a/0x110 [81447e10] ? neigh_alloc+0x180/0x180 [8107f82a] run_timer_softirq+0x13a/0x220 [81075c39] __do_softirq+0xb9/0x1d0 [810d9678] ? handle_percpu_irq+0x48/0x70 [81511d3c] call_softirq+0x1c/0x30 [810172e5] do_softirq+0x65/0xa0 [8107656b] irq_exit+0xab/0xc0 [812f97d5] xen_evtchn_do_upcall+0x35/0x50 [81511d8e] xen_do_hypervisor_callback+0x1e/0x30 EOI [810013aa] ? xen_hypercall_sched_op+0xa/0x20 [810013aa] ? xen_hypercall_sched_op+0xa/0x20 [8100a0b0] ? xen_safe_halt+0x10/0x20 [8101dfeb] ? default_idle+0x5b/0x170 [81014ac6] ? cpu_idle+0xc6/0xf0 [8100a8c9] ? xen_irq_enable_direct_reloc+0x4/0x4 [814f7bbe] ? cpu_bringup_and_idle+0xe/0x10 Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c RIP [812605bb] memcpy+0xb/0x120 RSP 8801003c3d58 CR2: 88006d9e8d48 Reviewed vmcore I found the skb-users is 1 at the moment, checked network neighbour history I found skb_get() be replaced by skb_copy by commit 7e36763b2c: commit 7e36763b2c204d59de4e88087f84a2c0c8421f25 Author: Frank Blaschka frank.blasc...@de.ibm.com Date: Mon Mar 3 12:16:04 2008 -0800 [NET]: Fix race in generic address resolution. neigh_update sends skb from neigh-arp_queue while neigh_timer_handler has increased skbs refcount and calls solicit with the skb. neigh_timer_handler should not increase skbs refcount but make a copy of the skb and do solicit with the copy. Signed-off-by: Frank Blaschka frank.blasc...@de.ibm.com Signed-off-by: David S. Miller da
Re: kernel panic in skb_copy_bits
Find a similar issue http://www.gossamer-threads.com/lists/xen/devel/265611 So copied to Xen developer as well. On 06/27/13 13:31, Eric Dumazet wrote: On Thu, 2013-06-27 at 10:58 +0800, Joe Jin wrote: Hi, When we do fail over test with iscsi + multipath by reset the switches on OVM(2.6.39) we hit the panic: BUG: unable to handle kernel paging request at 88006d9e8d48 IP: [812605bb] memcpy+0xb/0x120 PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 Oops: [#1] SMP CPU 7 Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ext! 3! j! bd mbcache Pid: 0, comm: swapper Tainted: GW 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 RIP: e030:[812605bb] [812605bb] memcpy+0xb/0x120 RSP: e02b:8801003c3d58 EFLAGS: 00010246 RAX: 880076b9e280 RBX: 8800714d2c00 RCX: 0057 RDX: RSI: 88006d9e8d48 RDI: 880076b9e280 RBP: 8801003c3dc0 R08: 000bf723 R09: R10: R11: 000a R12: 0034 R13: 0034 R14: 02b8 R15: 05a8 FS: 7fc1e852a6e0() GS:8801003c() knlGS: CS: e033 DS: 002b ES: 002b CR0: 8005003b CR2: 88006d9e8d48 CR3: 6370b000 CR4: 2660 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process swapper (pid: 0, threadinfo 880077ac, task 880077abe240) Stack: 8142db21 880076b9e280 8800637097f0 02ec 02b8 880077ac 8800637097f0 880066c9a7c0 fdb4 024c Call Trace: IRQ [8142db21] ? skb_copy_bits+0x1c1/0x2e0 [8142f173] skb_copy+0xf3/0x120 [81447fbc] neigh_timer_handler+0x1ac/0x350 [810573fe] ? account_idle_ticks+0xe/0x10 [81447e10] ? neigh_alloc+0x180/0x180 [8107dbaa] call_timer_fn+0x4a/0x110 [81447e10] ? neigh_alloc+0x180/0x180 [8107f82a] run_timer_softirq+0x13a/0x220 [81075c39] __do_softirq+0xb9/0x1d0 [810d9678] ? handle_percpu_irq+0x48/0x70 [81511d3c] call_softirq+0x1c/0x30 [810172e5] do_softirq+0x65/0xa0 [8107656b] irq_exit+0xab/0xc0 [812f97d5] xen_evtchn_do_upcall+0x35/0x50 [81511d8e] xen_do_hypervisor_callback+0x1e/0x30 EOI [810013aa] ? xen_hypercall_sched_op+0xa/0x20 [810013aa] ? xen_hypercall_sched_op+0xa/0x20 [8100a0b0] ? xen_safe_halt+0x10/0x20 [8101dfeb] ? default_idle+0x5b/0x170 [81014ac6] ? cpu_idle+0xc6/0xf0 [8100a8c9] ? xen_irq_enable_direct_reloc+0x4/0x4 [814f7bbe] ? cpu_bringup_and_idle+0xe/0x10 Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c RIP [812605bb] memcpy+0xb/0x120 RSP 8801003c3d58 CR2: 88006d9e8d48 Reviewed vmcore I found the skb-users is 1 at the moment, checked network neighbour history I found skb_get() be replaced by skb_copy by commit 7e36763b2c: commit 7e36763b2c204d59de4e88087f84a2c0c8421f25 Author: Frank Blaschka frank.blasc...@de.ibm.com Date: Mon Mar 3 12:16:04 2008 -0800 [NET]: Fix race in generic address resolution. neigh_update sends skb from neigh-arp_queue while neigh_timer_handler has increased skbs refcount and calls solicit with the skb. neigh_timer_handler should not increase skbs refcount but make a copy of the skb and do solicit with the copy. Signed-off-by: Frank Blaschka frank.blasc...@de.ibm.com Signed-off
kernel panic in skb_copy_bits
Hi, When we do fail over test with iscsi + multipath by reset the switches on OVM(2.6.39) we hit the panic: BUG: unable to handle kernel paging request at 88006d9e8d48 IP: [] memcpy+0xb/0x120 PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 Oops: [#1] SMP CPU 7 Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ext3 j! bd mbcache Pid: 0, comm: swapper Tainted: GW 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 RIP: e030:[] [] memcpy+0xb/0x120 RSP: e02b:8801003c3d58 EFLAGS: 00010246 RAX: 880076b9e280 RBX: 8800714d2c00 RCX: 0057 RDX: RSI: 88006d9e8d48 RDI: 880076b9e280 RBP: 8801003c3dc0 R08: 000bf723 R09: R10: R11: 000a R12: 0034 R13: 0034 R14: 02b8 R15: 05a8 FS: 7fc1e852a6e0() GS:8801003c() knlGS: CS: e033 DS: 002b ES: 002b CR0: 8005003b CR2: 88006d9e8d48 CR3: 6370b000 CR4: 2660 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process swapper (pid: 0, threadinfo 880077ac, task 880077abe240) Stack: 8142db21 880076b9e280 8800637097f0 02ec 02b8 880077ac 8800637097f0 880066c9a7c0 fdb4 024c Call Trace: [] ? skb_copy_bits+0x1c1/0x2e0 [] skb_copy+0xf3/0x120 [] neigh_timer_handler+0x1ac/0x350 [] ? account_idle_ticks+0xe/0x10 [] ? neigh_alloc+0x180/0x180 [] call_timer_fn+0x4a/0x110 [] ? neigh_alloc+0x180/0x180 [] run_timer_softirq+0x13a/0x220 [] __do_softirq+0xb9/0x1d0 [] ? handle_percpu_irq+0x48/0x70 [] call_softirq+0x1c/0x30 [] do_softirq+0x65/0xa0 [] irq_exit+0xab/0xc0 [] xen_evtchn_do_upcall+0x35/0x50 [] xen_do_hypervisor_callback+0x1e/0x30 [] ? xen_hypercall_sched_op+0xa/0x20 [] ? xen_hypercall_sched_op+0xa/0x20 [] ? xen_safe_halt+0x10/0x20 [] ? default_idle+0x5b/0x170 [] ? cpu_idle+0xc6/0xf0 [] ? xen_irq_enable_direct_reloc+0x4/0x4 [] ? cpu_bringup_and_idle+0xe/0x10 Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c RIP [] memcpy+0xb/0x120 RSP CR2: 88006d9e8d48 Reviewed vmcore I found the skb->users is 1 at the moment, checked network neighbour history I found skb_get() be replaced by skb_copy by commit 7e36763b2c: commit 7e36763b2c204d59de4e88087f84a2c0c8421f25 Author: Frank Blaschka Date: Mon Mar 3 12:16:04 2008 -0800 [NET]: Fix race in generic address resolution. neigh_update sends skb from neigh->arp_queue while neigh_timer_handler has increased skbs refcount and calls solicit with the skb. neigh_timer_handler should not increase skbs refcount but make a copy of the skb and do solicit with the copy. Signed-off-by: Frank Blaschka Signed-off-by: David S. Miller So can you please give some details of the race? per vmcore seems like the skb data be freed, I suspected skb_get() lost at somewhere? I reverted above commit the panic not occurred during our testing. Any input will appreciate! Best Regards, Joe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
kernel panic in skb_copy_bits
Hi, When we do fail over test with iscsi + multipath by reset the switches on OVM(2.6.39) we hit the panic: BUG: unable to handle kernel paging request at 88006d9e8d48 IP: [812605bb] memcpy+0xb/0x120 PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 Oops: [#1] SMP CPU 7 Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ext3 j! bd mbcache Pid: 0, comm: swapper Tainted: GW 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 RIP: e030:[812605bb] [812605bb] memcpy+0xb/0x120 RSP: e02b:8801003c3d58 EFLAGS: 00010246 RAX: 880076b9e280 RBX: 8800714d2c00 RCX: 0057 RDX: RSI: 88006d9e8d48 RDI: 880076b9e280 RBP: 8801003c3dc0 R08: 000bf723 R09: R10: R11: 000a R12: 0034 R13: 0034 R14: 02b8 R15: 05a8 FS: 7fc1e852a6e0() GS:8801003c() knlGS: CS: e033 DS: 002b ES: 002b CR0: 8005003b CR2: 88006d9e8d48 CR3: 6370b000 CR4: 2660 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process swapper (pid: 0, threadinfo 880077ac, task 880077abe240) Stack: 8142db21 880076b9e280 8800637097f0 02ec 02b8 880077ac 8800637097f0 880066c9a7c0 fdb4 024c Call Trace: IRQ [8142db21] ? skb_copy_bits+0x1c1/0x2e0 [8142f173] skb_copy+0xf3/0x120 [81447fbc] neigh_timer_handler+0x1ac/0x350 [810573fe] ? account_idle_ticks+0xe/0x10 [81447e10] ? neigh_alloc+0x180/0x180 [8107dbaa] call_timer_fn+0x4a/0x110 [81447e10] ? neigh_alloc+0x180/0x180 [8107f82a] run_timer_softirq+0x13a/0x220 [81075c39] __do_softirq+0xb9/0x1d0 [810d9678] ? handle_percpu_irq+0x48/0x70 [81511d3c] call_softirq+0x1c/0x30 [810172e5] do_softirq+0x65/0xa0 [8107656b] irq_exit+0xab/0xc0 [812f97d5] xen_evtchn_do_upcall+0x35/0x50 [81511d8e] xen_do_hypervisor_callback+0x1e/0x30 EOI [810013aa] ? xen_hypercall_sched_op+0xa/0x20 [810013aa] ? xen_hypercall_sched_op+0xa/0x20 [8100a0b0] ? xen_safe_halt+0x10/0x20 [8101dfeb] ? default_idle+0x5b/0x170 [81014ac6] ? cpu_idle+0xc6/0xf0 [8100a8c9] ? xen_irq_enable_direct_reloc+0x4/0x4 [814f7bbe] ? cpu_bringup_and_idle+0xe/0x10 Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c RIP [812605bb] memcpy+0xb/0x120 RSP 8801003c3d58 CR2: 88006d9e8d48 Reviewed vmcore I found the skb-users is 1 at the moment, checked network neighbour history I found skb_get() be replaced by skb_copy by commit 7e36763b2c: commit 7e36763b2c204d59de4e88087f84a2c0c8421f25 Author: Frank Blaschka frank.blasc...@de.ibm.com Date: Mon Mar 3 12:16:04 2008 -0800 [NET]: Fix race in generic address resolution. neigh_update sends skb from neigh-arp_queue while neigh_timer_handler has increased skbs refcount and calls solicit with the skb. neigh_timer_handler should not increase skbs refcount but make a copy of the skb and do solicit with the copy. Signed-off-by: Frank Blaschka frank.blasc...@de.ibm.com Signed-off-by: David S. Miller da...@davemloft.net So can you please give some details of the race? per vmcore seems like the skb data be freed, I suspected skb_get() lost at somewhere? I reverted above commit the panic not occurred during our testing. Any input will appreciate! Best Regards, Joe -- To
Re: [PATCH] ACPI: update user_policy.max when _PPC updated
On 06/07/13 03:54, Rafael J. Wysocki wrote: > Do you mean you set a limit in the BIOS setup and the kernel changed that > limit > on boot? Sorry for the confusing. The issue is when we disable hardcap before kernel boot up, after kernel bring up, any changes of _PPC will update scaling_max_freq properly. If we enable hardcap before kernel boot up, after kernel bring up, even we disable it, scaling_max_freq does not be updated to max frequency, the max frequency just up to the value when bring up. Review related codes I found the limit came from user_policy.max, means when we set user_policy.max to 1000MHZ when boot up, then any changes of _PPC could not enlarge the scaling_max_freq, I think this is not as expected? please advise. Thanks, Joe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ACPI: update user_policy.max when _PPC updated
On 06/06/13 19:06, Rafael J. Wysocki wrote: > On Thursday, June 06, 2013 08:27:08 AM Joe Jin wrote: >> On 06/06/13 04:40, Rafael J. Wysocki wrote: >>> On Wednesday, June 05, 2013 08:52:52 AM Joe Jin wrote: >>>> When _PPC changed dynamically the user_policy.max will not be updated, >>>> this prevent CPU run on the highest frequency. >>> >>> Why should the user setting be always related to the current maximum >>> available >>> frequency? What if the user sets the limit for power capping purposes? >> >> cpufreq_update_policy() get policy->max from user_policy.max: >> >> 1782 int cpufreq_update_policy(unsigned int cpu) >> 1783 { >> [...] >> 1800 policy.min = data->user_policy.min; >> 1801 policy.max = data->user_policy.max; >> 1802 policy.policy = data->user_policy.policy; >> 1803 policy.governor = data->user_policy.governor; >> [...] >> 1819 ret = __cpufreq_set_policy(data, ); >> [...] >> >> /sys/devices/system/cpu/cpu$/cpufreq/scaling_max_freq using policy->max >> and user_policy->max, when update it, so I think _PPC changes also need >> to update these two? > > Yes, if policy.max happens to be greater that the maximum available frequency, > then (and only then) it probably should be updated. It should never be bumped > up, though. Does this means if I enabled hardcap before kernel boot up, and later system brought up and I disabled hardcap, I has to enlarge the max frequency manually? Thanks, Joe > > Thanks, > Rafael > > >>>> Signed-off-by: Joe Jin >>>> Cc: Rafael J. Wysocki >>>> Cc: Viresh Kumar >>>> --- >>>> drivers/acpi/processor_perflib.c | 17 - >>>> 1 file changed, 16 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/acpi/processor_perflib.c >>>> b/drivers/acpi/processor_perflib.c >>>> index e854582..e01aa7d 100644 >>>> --- a/drivers/acpi/processor_perflib.c >>>> +++ b/drivers/acpi/processor_perflib.c >>>> @@ -180,6 +180,7 @@ static void acpi_processor_ppc_ost(acpi_handle handle, >>>> int status) >>>> int acpi_processor_ppc_has_changed(struct acpi_processor *pr, int >>>> event_flag) >>>> { >>>>int ret; >>>> + unsigned int saved = (unsigned int)pr->performance_platform_limit; >>>> >>>>if (ignore_ppc) { >>>>/* >>>> @@ -204,8 +205,22 @@ int acpi_processor_ppc_has_changed(struct >>>> acpi_processor *pr, int event_flag) >>>>} >>>>if (ret < 0) >>>>return (ret); >>>> - else >>>> + else { >>>> + unsigned int ppc = (unsigned int)pr->performance_platform_limit; >>>> + >>>> + if (saved != ppc) { >>>> + struct cpufreq_policy *policy; >>>> + >>>> + policy = cpufreq_cpu_get(pr->id); >>>> + if (likely(policy)) >>>> + policy->user_policy.max = >>>> + pr->performance->states[ppc]. >>>> + core_frequency * 1000; >>>> + cpufreq_cpu_put(policy); >>>> + } >>>> + >>>>return cpufreq_update_policy(pr->id); >>>> + } >>>> } >>>> >>>> int acpi_processor_get_bios_limit(int cpu, unsigned int *limit) >>>> >> >> >> -- Oracle <http://www.oracle.com> Joe Jin | Software Development Senior Manager | +8610.6106.5624 ORACLE | Linux and Virtualization No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ACPI: update user_policy.max when _PPC updated
On 06/06/13 19:06, Rafael J. Wysocki wrote: On Thursday, June 06, 2013 08:27:08 AM Joe Jin wrote: On 06/06/13 04:40, Rafael J. Wysocki wrote: On Wednesday, June 05, 2013 08:52:52 AM Joe Jin wrote: When _PPC changed dynamically the user_policy.max will not be updated, this prevent CPU run on the highest frequency. Why should the user setting be always related to the current maximum available frequency? What if the user sets the limit for power capping purposes? cpufreq_update_policy() get policy-max from user_policy.max: 1782 int cpufreq_update_policy(unsigned int cpu) 1783 { [...] 1800 policy.min = data-user_policy.min; 1801 policy.max = data-user_policy.max; 1802 policy.policy = data-user_policy.policy; 1803 policy.governor = data-user_policy.governor; [...] 1819 ret = __cpufreq_set_policy(data, policy); [...] /sys/devices/system/cpu/cpu$/cpufreq/scaling_max_freq using policy-max and user_policy-max, when update it, so I think _PPC changes also need to update these two? Yes, if policy.max happens to be greater that the maximum available frequency, then (and only then) it probably should be updated. It should never be bumped up, though. Does this means if I enabled hardcap before kernel boot up, and later system brought up and I disabled hardcap, I has to enlarge the max frequency manually? Thanks, Joe Thanks, Rafael Signed-off-by: Joe Jin joe@oracle.com Cc: Rafael J. Wysocki r...@sisk.pl Cc: Viresh Kumar viresh.ku...@linaro.org --- drivers/acpi/processor_perflib.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c index e854582..e01aa7d 100644 --- a/drivers/acpi/processor_perflib.c +++ b/drivers/acpi/processor_perflib.c @@ -180,6 +180,7 @@ static void acpi_processor_ppc_ost(acpi_handle handle, int status) int acpi_processor_ppc_has_changed(struct acpi_processor *pr, int event_flag) { int ret; + unsigned int saved = (unsigned int)pr-performance_platform_limit; if (ignore_ppc) { /* @@ -204,8 +205,22 @@ int acpi_processor_ppc_has_changed(struct acpi_processor *pr, int event_flag) } if (ret 0) return (ret); - else + else { + unsigned int ppc = (unsigned int)pr-performance_platform_limit; + + if (saved != ppc) { + struct cpufreq_policy *policy; + + policy = cpufreq_cpu_get(pr-id); + if (likely(policy)) + policy-user_policy.max = + pr-performance-states[ppc]. + core_frequency * 1000; + cpufreq_cpu_put(policy); + } + return cpufreq_update_policy(pr-id); + } } int acpi_processor_get_bios_limit(int cpu, unsigned int *limit) -- Oracle http://www.oracle.com Joe Jin | Software Development Senior Manager | +8610.6106.5624 ORACLE | Linux and Virtualization No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ACPI: update user_policy.max when _PPC updated
On 06/07/13 03:54, Rafael J. Wysocki wrote: Do you mean you set a limit in the BIOS setup and the kernel changed that limit on boot? Sorry for the confusing. The issue is when we disable hardcap before kernel boot up, after kernel bring up, any changes of _PPC will update scaling_max_freq properly. If we enable hardcap before kernel boot up, after kernel bring up, even we disable it, scaling_max_freq does not be updated to max frequency, the max frequency just up to the value when bring up. Review related codes I found the limit came from user_policy.max, means when we set user_policy.max to 1000MHZ when boot up, then any changes of _PPC could not enlarge the scaling_max_freq, I think this is not as expected? please advise. Thanks, Joe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ACPI: update user_policy.max when _PPC updated
On 06/06/13 04:40, Rafael J. Wysocki wrote: > On Wednesday, June 05, 2013 08:52:52 AM Joe Jin wrote: >> When _PPC changed dynamically the user_policy.max will not be updated, >> this prevent CPU run on the highest frequency. > > Why should the user setting be always related to the current maximum available > frequency? What if the user sets the limit for power capping purposes? cpufreq_update_policy() get policy->max from user_policy.max: 1782 int cpufreq_update_policy(unsigned int cpu) 1783 { [...] 1800 policy.min = data->user_policy.min; 1801 policy.max = data->user_policy.max; 1802 policy.policy = data->user_policy.policy; 1803 policy.governor = data->user_policy.governor; [...] 1819 ret = __cpufreq_set_policy(data, ); [...] /sys/devices/system/cpu/cpu$/cpufreq/scaling_max_freq using policy->max and user_policy->max, when update it, so I think _PPC changes also need to update these two? Thanks, Joe > > Rafael > > >> Signed-off-by: Joe Jin >> Cc: Rafael J. Wysocki >> Cc: Viresh Kumar >> --- >> drivers/acpi/processor_perflib.c | 17 - >> 1 file changed, 16 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/acpi/processor_perflib.c >> b/drivers/acpi/processor_perflib.c >> index e854582..e01aa7d 100644 >> --- a/drivers/acpi/processor_perflib.c >> +++ b/drivers/acpi/processor_perflib.c >> @@ -180,6 +180,7 @@ static void acpi_processor_ppc_ost(acpi_handle handle, >> int status) >> int acpi_processor_ppc_has_changed(struct acpi_processor *pr, int >> event_flag) >> { >> int ret; >> +unsigned int saved = (unsigned int)pr->performance_platform_limit; >> >> if (ignore_ppc) { >> /* >> @@ -204,8 +205,22 @@ int acpi_processor_ppc_has_changed(struct >> acpi_processor *pr, int event_flag) >> } >> if (ret < 0) >> return (ret); >> -else >> +else { >> +unsigned int ppc = (unsigned int)pr->performance_platform_limit; >> + >> +if (saved != ppc) { >> +struct cpufreq_policy *policy; >> + >> +policy = cpufreq_cpu_get(pr->id); >> +if (likely(policy)) >> +policy->user_policy.max = >> +pr->performance->states[ppc]. >> + core_frequency * 1000; >> +cpufreq_cpu_put(policy); >> +} >> + >> return cpufreq_update_policy(pr->id); >> +} >> } >> >> int acpi_processor_get_bios_limit(int cpu, unsigned int *limit) >> -- Oracle <http://www.oracle.com> Joe Jin | Software Development Senior Manager | +8610.6106.5624 ORACLE | Linux and Virtualization No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ACPI: update user_policy.max when _PPC updated
On 06/06/13 04:40, Rafael J. Wysocki wrote: On Wednesday, June 05, 2013 08:52:52 AM Joe Jin wrote: When _PPC changed dynamically the user_policy.max will not be updated, this prevent CPU run on the highest frequency. Why should the user setting be always related to the current maximum available frequency? What if the user sets the limit for power capping purposes? cpufreq_update_policy() get policy-max from user_policy.max: 1782 int cpufreq_update_policy(unsigned int cpu) 1783 { [...] 1800 policy.min = data-user_policy.min; 1801 policy.max = data-user_policy.max; 1802 policy.policy = data-user_policy.policy; 1803 policy.governor = data-user_policy.governor; [...] 1819 ret = __cpufreq_set_policy(data, policy); [...] /sys/devices/system/cpu/cpu$/cpufreq/scaling_max_freq using policy-max and user_policy-max, when update it, so I think _PPC changes also need to update these two? Thanks, Joe Rafael Signed-off-by: Joe Jin joe@oracle.com Cc: Rafael J. Wysocki r...@sisk.pl Cc: Viresh Kumar viresh.ku...@linaro.org --- drivers/acpi/processor_perflib.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c index e854582..e01aa7d 100644 --- a/drivers/acpi/processor_perflib.c +++ b/drivers/acpi/processor_perflib.c @@ -180,6 +180,7 @@ static void acpi_processor_ppc_ost(acpi_handle handle, int status) int acpi_processor_ppc_has_changed(struct acpi_processor *pr, int event_flag) { int ret; +unsigned int saved = (unsigned int)pr-performance_platform_limit; if (ignore_ppc) { /* @@ -204,8 +205,22 @@ int acpi_processor_ppc_has_changed(struct acpi_processor *pr, int event_flag) } if (ret 0) return (ret); -else +else { +unsigned int ppc = (unsigned int)pr-performance_platform_limit; + +if (saved != ppc) { +struct cpufreq_policy *policy; + +policy = cpufreq_cpu_get(pr-id); +if (likely(policy)) +policy-user_policy.max = +pr-performance-states[ppc]. +core_frequency * 1000; +cpufreq_cpu_put(policy); +} + return cpufreq_update_policy(pr-id); +} } int acpi_processor_get_bios_limit(int cpu, unsigned int *limit) -- Oracle http://www.oracle.com Joe Jin | Software Development Senior Manager | +8610.6106.5624 ORACLE | Linux and Virtualization No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ACPI: update user_policy.max when _PPC updated
When _PPC changed dynamically the user_policy.max will not be updated, this prevent CPU run on the highest frequency. Signed-off-by: Joe Jin Cc: Rafael J. Wysocki Cc: Viresh Kumar --- drivers/acpi/processor_perflib.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c index e854582..e01aa7d 100644 --- a/drivers/acpi/processor_perflib.c +++ b/drivers/acpi/processor_perflib.c @@ -180,6 +180,7 @@ static void acpi_processor_ppc_ost(acpi_handle handle, int status) int acpi_processor_ppc_has_changed(struct acpi_processor *pr, int event_flag) { int ret; + unsigned int saved = (unsigned int)pr->performance_platform_limit; if (ignore_ppc) { /* @@ -204,8 +205,22 @@ int acpi_processor_ppc_has_changed(struct acpi_processor *pr, int event_flag) } if (ret < 0) return (ret); - else + else { + unsigned int ppc = (unsigned int)pr->performance_platform_limit; + + if (saved != ppc) { + struct cpufreq_policy *policy; + + policy = cpufreq_cpu_get(pr->id); + if (likely(policy)) + policy->user_policy.max = + pr->performance->states[ppc]. + core_frequency * 1000; + cpufreq_cpu_put(policy); + } + return cpufreq_update_policy(pr->id); + } } int acpi_processor_get_bios_limit(int cpu, unsigned int *limit) -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ACPI: update user_policy.max when _PPC updated
When _PPC changed dynamically the user_policy.max will not be updated, this prevent CPU run on the highest frequency. Signed-off-by: Joe Jin joe@oracle.com Cc: Rafael J. Wysocki r...@sisk.pl Cc: Viresh Kumar viresh.ku...@linaro.org --- drivers/acpi/processor_perflib.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c index e854582..e01aa7d 100644 --- a/drivers/acpi/processor_perflib.c +++ b/drivers/acpi/processor_perflib.c @@ -180,6 +180,7 @@ static void acpi_processor_ppc_ost(acpi_handle handle, int status) int acpi_processor_ppc_has_changed(struct acpi_processor *pr, int event_flag) { int ret; + unsigned int saved = (unsigned int)pr-performance_platform_limit; if (ignore_ppc) { /* @@ -204,8 +205,22 @@ int acpi_processor_ppc_has_changed(struct acpi_processor *pr, int event_flag) } if (ret 0) return (ret); - else + else { + unsigned int ppc = (unsigned int)pr-performance_platform_limit; + + if (saved != ppc) { + struct cpufreq_policy *policy; + + policy = cpufreq_cpu_get(pr-id); + if (likely(policy)) + policy-user_policy.max = + pr-performance-states[ppc]. + core_frequency * 1000; + cpufreq_cpu_put(policy); + } + return cpufreq_update_policy(pr-id); + } } int acpi_processor_get_bios_limit(int cpu, unsigned int *limit) -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
Hi Yijing, Thanks for your reference, the patch looks good for me, but I have no chance to test it on customer's env. Best Regards, Joe On 12/19/12 13:52, Yijing Wang wrote: > On 2012/12/19 11:04, Joe Jin wrote: >> Hi all, >> >> I backported mps commits and ask customer pass "pci=pcie_bus_peer2pee" to >> kernel >> to limited MPS to 128 and issue disappeared, sound like this is a BIOS bug. >> > > Hi Joe, >I found similar problem when I do pci hotplug, discussion is > here:http://marc.info/?l=linux-pci=134810569924220=2. > We try to improve Linux kernel to debug this problem easily based Bjorn's > suggestion. Jon sent out the first version patch > http://marc.info/?l=linux-pci=135002016005274=2. > I think we can do further here, > http://marc.info/?l=linux-pci=135115581307869=2. I hope this information > can help you. > > Thanks! > Yijing. > >> Thanks all of your help. >> >> Best Regards, >> Joe >> >> On 11/29/12 23:52, Fujinaka, Todd wrote: >>> Someone else pointed this out to me locally. If you have a non-client BIOS, >>> you should be able to set the MaxPayloadSize using setpci. You have to make >>> sure that you're being consistent throughout all the associated links. >>> >>> Todd Fujinaka >>> Technical Marketing Engineer >>> LAN Access Division (LAD) >>> Intel Corporation >>> todd.fujin...@intel.com >>> (503) 712-4565 >>> >>> >>> -Original Message- >>> From: Ethan Zhao [mailto:ethan.ker...@gmail.com] >>> Sent: Wednesday, November 28, 2012 7:10 PM >>> To: Fujinaka, Todd >>> Cc: Joe Jin; Ben Hutchings; Mary Mcgrath; net...@vger.kernel.org; >>> e1000-de...@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci >>> Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang >>> >>> Joe, >>> Possibly your customer is running a kernel without source code on a >>> platform whose vendor wouldn't like to fix BIOS issue( Is that a HP/Dell >>> server ?). >>> Anyway, to see if is a payload issue or, you could change the payload >>> size with setpci tool to those devices and set the link retrain bit to >>> trigger the link retraining to debug the issue and identity the root cause. >>> I thinks it is much easier than modify the BIOS or eeprom of NIC. >>> >>> e.g. >>>set device control register to 0f 00 (128 bytes payload size) >>># setpci -v -s 00:02.0 98.w=000f >>>set device link control register to 60h (retrain the link) >>># setpci -v -s 00:02.0 a0.b=60 >>> >>> Hope it works, Just my 2 cents. >>> >>> ethan.z...@oracle.com >>> >>> On Wed, Nov 28, 2012 at 11:53 PM, Fujinaka, Todd >>> wrote: >>>> The only EEPROM I know about or can speak to is the one attached to the >>>> 82571 and it doesn't set the MaxPayloadSize. That's done by the BIOS. >>>> >>>> Todd Fujinaka >>>> Technical Marketing Engineer >>>> LAN Access Division (LAD) >>>> Intel Corporation >>>> todd.fujin...@intel.com >>>> (503) 712-4565 >>>> >>>> >>>> -Original Message- >>>> From: Joe Jin [mailto:joe@oracle.com] >>>> Sent: Wednesday, November 28, 2012 12:31 AM >>>> To: Ben Hutchings >>>> Cc: Fujinaka, Todd; Mary Mcgrath; net...@vger.kernel.org; >>>> e1000-de...@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci >>>> Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang >>>> >>>> On 11/28/12 02:10, Ben Hutchings wrote: >>>>> On Tue, 2012-11-27 at 17:32 +, Fujinaka, Todd wrote: >>>>>> Forgive me if I'm being too repetitious as I think some of this has >>>>>> been mentioned in the past. >>>>>> >>>>>> We (and by we I mean the Ethernet part and driver) can only change >>>>>> the advertised availability of a larger MaxPayloadSize. The size is >>>>>> negotiated by both sides of the link when the link is established. >>>>>> The driver should not change the size of the link as it would be >>>>>> poking at registers outside of its scope and is controlled by the >>>>>> upstream bridge (not us). >>>>> [...] >>>>> >>>>> MaxPayloadSize (MPS) is not negotiated between devices but is >>>>> programm
Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
Hi all, I backported mps commits and ask customer pass "pci=pcie_bus_peer2pee" to kernel to limited MPS to 128 and issue disappeared, sound like this is a BIOS bug. Thanks all of your help. Best Regards, Joe On 11/29/12 23:52, Fujinaka, Todd wrote: > Someone else pointed this out to me locally. If you have a non-client BIOS, > you should be able to set the MaxPayloadSize using setpci. You have to make > sure that you're being consistent throughout all the associated links. > > Todd Fujinaka > Technical Marketing Engineer > LAN Access Division (LAD) > Intel Corporation > todd.fujin...@intel.com > (503) 712-4565 > > > -Original Message- > From: Ethan Zhao [mailto:ethan.ker...@gmail.com] > Sent: Wednesday, November 28, 2012 7:10 PM > To: Fujinaka, Todd > Cc: Joe Jin; Ben Hutchings; Mary Mcgrath; net...@vger.kernel.org; > e1000-de...@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci > Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang > > Joe, > Possibly your customer is running a kernel without source code on a > platform whose vendor wouldn't like to fix BIOS issue( Is that a HP/Dell > server ?). > Anyway, to see if is a payload issue or, you could change the payload > size with setpci tool to those devices and set the link retrain bit to > trigger the link retraining to debug the issue and identity the root cause. > I thinks it is much easier than modify the BIOS or eeprom of NIC. > > e.g. >set device control register to 0f 00 (128 bytes payload size) ># setpci -v -s 00:02.0 98.w=000f >set device link control register to 60h (retrain the link) ># setpci -v -s 00:02.0 a0.b=60 > > Hope it works, Just my 2 cents. > > ethan.z...@oracle.com > > On Wed, Nov 28, 2012 at 11:53 PM, Fujinaka, Todd > wrote: >> The only EEPROM I know about or can speak to is the one attached to the >> 82571 and it doesn't set the MaxPayloadSize. That's done by the BIOS. >> >> Todd Fujinaka >> Technical Marketing Engineer >> LAN Access Division (LAD) >> Intel Corporation >> todd.fujin...@intel.com >> (503) 712-4565 >> >> >> -Original Message- >> From: Joe Jin [mailto:joe@oracle.com] >> Sent: Wednesday, November 28, 2012 12:31 AM >> To: Ben Hutchings >> Cc: Fujinaka, Todd; Mary Mcgrath; net...@vger.kernel.org; >> e1000-de...@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci >> Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang >> >> On 11/28/12 02:10, Ben Hutchings wrote: >>> On Tue, 2012-11-27 at 17:32 +, Fujinaka, Todd wrote: >>>> Forgive me if I'm being too repetitious as I think some of this has >>>> been mentioned in the past. >>>> >>>> We (and by we I mean the Ethernet part and driver) can only change >>>> the advertised availability of a larger MaxPayloadSize. The size is >>>> negotiated by both sides of the link when the link is established. >>>> The driver should not change the size of the link as it would be >>>> poking at registers outside of its scope and is controlled by the >>>> upstream bridge (not us). >>> [...] >>> >>> MaxPayloadSize (MPS) is not negotiated between devices but is >>> programmed by the system firmware (at least for devices present at >>> boot - the kernel may be responsible in case of hotplug). You can >>> use the kernel parameter 'pci=pcie_bus_perf' (or one of several >>> others) to set a policy that overrides this, but no policy will allow >>> setting MPS above the device's MaxPayloadSizeSupported (MPSS). >>> >> >> Ben, >> >> Unfortunately I'm using 3.0.x kernel and this is not included in the kernel. >> So I'm trying to use ethtool modify it from eeprom to see if help or no. >> >> >> Todd, I'll review all MaxPayload for all devices, but need to say if it >> mismatch, customer could not modify it from BIOS for there was not entry at >> there, to test it, we have to find how to verify if this is the root cause, >> so still need to find the offset in eeprom. >> >> Thanks in advance, >> Joe >> -- Oracle <http://www.oracle.com> Joe Jin | Software Development Senior Manager | +8610.6106.5624 ORACLE | Linux and Virtualization No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
Hi all, I backported mps commits and ask customer pass pci=pcie_bus_peer2pee to kernel to limited MPS to 128 and issue disappeared, sound like this is a BIOS bug. Thanks all of your help. Best Regards, Joe On 11/29/12 23:52, Fujinaka, Todd wrote: Someone else pointed this out to me locally. If you have a non-client BIOS, you should be able to set the MaxPayloadSize using setpci. You have to make sure that you're being consistent throughout all the associated links. Todd Fujinaka Technical Marketing Engineer LAN Access Division (LAD) Intel Corporation todd.fujin...@intel.com (503) 712-4565 -Original Message- From: Ethan Zhao [mailto:ethan.ker...@gmail.com] Sent: Wednesday, November 28, 2012 7:10 PM To: Fujinaka, Todd Cc: Joe Jin; Ben Hutchings; Mary Mcgrath; net...@vger.kernel.org; e1000-de...@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang Joe, Possibly your customer is running a kernel without source code on a platform whose vendor wouldn't like to fix BIOS issue( Is that a HP/Dell server ?). Anyway, to see if is a payload issue or, you could change the payload size with setpci tool to those devices and set the link retrain bit to trigger the link retraining to debug the issue and identity the root cause. I thinks it is much easier than modify the BIOS or eeprom of NIC. e.g. set device control register to 0f 00 (128 bytes payload size) # setpci -v -s 00:02.0 98.w=000f set device link control register to 60h (retrain the link) # setpci -v -s 00:02.0 a0.b=60 Hope it works, Just my 2 cents. ethan.z...@oracle.com On Wed, Nov 28, 2012 at 11:53 PM, Fujinaka, Todd todd.fujin...@intel.com wrote: The only EEPROM I know about or can speak to is the one attached to the 82571 and it doesn't set the MaxPayloadSize. That's done by the BIOS. Todd Fujinaka Technical Marketing Engineer LAN Access Division (LAD) Intel Corporation todd.fujin...@intel.com (503) 712-4565 -Original Message- From: Joe Jin [mailto:joe@oracle.com] Sent: Wednesday, November 28, 2012 12:31 AM To: Ben Hutchings Cc: Fujinaka, Todd; Mary Mcgrath; net...@vger.kernel.org; e1000-de...@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang On 11/28/12 02:10, Ben Hutchings wrote: On Tue, 2012-11-27 at 17:32 +, Fujinaka, Todd wrote: Forgive me if I'm being too repetitious as I think some of this has been mentioned in the past. We (and by we I mean the Ethernet part and driver) can only change the advertised availability of a larger MaxPayloadSize. The size is negotiated by both sides of the link when the link is established. The driver should not change the size of the link as it would be poking at registers outside of its scope and is controlled by the upstream bridge (not us). [...] MaxPayloadSize (MPS) is not negotiated between devices but is programmed by the system firmware (at least for devices present at boot - the kernel may be responsible in case of hotplug). You can use the kernel parameter 'pci=pcie_bus_perf' (or one of several others) to set a policy that overrides this, but no policy will allow setting MPS above the device's MaxPayloadSizeSupported (MPSS). Ben, Unfortunately I'm using 3.0.x kernel and this is not included in the kernel. So I'm trying to use ethtool modify it from eeprom to see if help or no. Todd, I'll review all MaxPayload for all devices, but need to say if it mismatch, customer could not modify it from BIOS for there was not entry at there, to test it, we have to find how to verify if this is the root cause, so still need to find the offset in eeprom. Thanks in advance, Joe -- Oracle http://www.oracle.com Joe Jin | Software Development Senior Manager | +8610.6106.5624 ORACLE | Linux and Virtualization No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
Hi Yijing, Thanks for your reference, the patch looks good for me, but I have no chance to test it on customer's env. Best Regards, Joe On 12/19/12 13:52, Yijing Wang wrote: On 2012/12/19 11:04, Joe Jin wrote: Hi all, I backported mps commits and ask customer pass pci=pcie_bus_peer2pee to kernel to limited MPS to 128 and issue disappeared, sound like this is a BIOS bug. Hi Joe, I found similar problem when I do pci hotplug, discussion is here:http://marc.info/?l=linux-pcim=134810569924220w=2. We try to improve Linux kernel to debug this problem easily based Bjorn's suggestion. Jon sent out the first version patch http://marc.info/?l=linux-pcim=135002016005274w=2. I think we can do further here, http://marc.info/?l=linux-pcim=135115581307869w=2. I hope this information can help you. Thanks! Yijing. Thanks all of your help. Best Regards, Joe On 11/29/12 23:52, Fujinaka, Todd wrote: Someone else pointed this out to me locally. If you have a non-client BIOS, you should be able to set the MaxPayloadSize using setpci. You have to make sure that you're being consistent throughout all the associated links. Todd Fujinaka Technical Marketing Engineer LAN Access Division (LAD) Intel Corporation todd.fujin...@intel.com (503) 712-4565 -Original Message- From: Ethan Zhao [mailto:ethan.ker...@gmail.com] Sent: Wednesday, November 28, 2012 7:10 PM To: Fujinaka, Todd Cc: Joe Jin; Ben Hutchings; Mary Mcgrath; net...@vger.kernel.org; e1000-de...@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang Joe, Possibly your customer is running a kernel without source code on a platform whose vendor wouldn't like to fix BIOS issue( Is that a HP/Dell server ?). Anyway, to see if is a payload issue or, you could change the payload size with setpci tool to those devices and set the link retrain bit to trigger the link retraining to debug the issue and identity the root cause. I thinks it is much easier than modify the BIOS or eeprom of NIC. e.g. set device control register to 0f 00 (128 bytes payload size) # setpci -v -s 00:02.0 98.w=000f set device link control register to 60h (retrain the link) # setpci -v -s 00:02.0 a0.b=60 Hope it works, Just my 2 cents. ethan.z...@oracle.com On Wed, Nov 28, 2012 at 11:53 PM, Fujinaka, Todd todd.fujin...@intel.com wrote: The only EEPROM I know about or can speak to is the one attached to the 82571 and it doesn't set the MaxPayloadSize. That's done by the BIOS. Todd Fujinaka Technical Marketing Engineer LAN Access Division (LAD) Intel Corporation todd.fujin...@intel.com (503) 712-4565 -Original Message- From: Joe Jin [mailto:joe@oracle.com] Sent: Wednesday, November 28, 2012 12:31 AM To: Ben Hutchings Cc: Fujinaka, Todd; Mary Mcgrath; net...@vger.kernel.org; e1000-de...@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang On 11/28/12 02:10, Ben Hutchings wrote: On Tue, 2012-11-27 at 17:32 +, Fujinaka, Todd wrote: Forgive me if I'm being too repetitious as I think some of this has been mentioned in the past. We (and by we I mean the Ethernet part and driver) can only change the advertised availability of a larger MaxPayloadSize. The size is negotiated by both sides of the link when the link is established. The driver should not change the size of the link as it would be poking at registers outside of its scope and is controlled by the upstream bridge (not us). [...] MaxPayloadSize (MPS) is not negotiated between devices but is programmed by the system firmware (at least for devices present at boot - the kernel may be responsible in case of hotplug). You can use the kernel parameter 'pci=pcie_bus_perf' (or one of several others) to set a policy that overrides this, but no policy will allow setting MPS above the device's MaxPayloadSizeSupported (MPSS). Ben, Unfortunately I'm using 3.0.x kernel and this is not included in the kernel. So I'm trying to use ethtool modify it from eeprom to see if help or no. Todd, I'll review all MaxPayload for all devices, but need to say if it mismatch, customer could not modify it from BIOS for there was not entry at there, to test it, we have to find how to verify if this is the root cause, so still need to find the offset in eeprom. Thanks in advance, Joe -- Oracle http://www.oracle.com Joe Jin | Software Development Senior Manager | +8610.6106.5624 ORACLE | Linux and Virtualization No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
On 11/28/12 02:10, Ben Hutchings wrote: > On Tue, 2012-11-27 at 17:32 +, Fujinaka, Todd wrote: >> Forgive me if I'm being too repetitious as I think some of this has >> been mentioned in the past. >> >> We (and by we I mean the Ethernet part and driver) can only change the >> advertised availability of a larger MaxPayloadSize. The size is >> negotiated by both sides of the link when the link is established. The >> driver should not change the size of the link as it would be poking at >> registers outside of its scope and is controlled by the upstream >> bridge (not us). > [...] > > MaxPayloadSize (MPS) is not negotiated between devices but is programmed > by the system firmware (at least for devices present at boot - the > kernel may be responsible in case of hotplug). You can use the kernel > parameter 'pci=pcie_bus_perf' (or one of several others) to set a policy > that overrides this, but no policy will allow setting MPS above the > device's MaxPayloadSizeSupported (MPSS). > Ben, Unfortunately I'm using 3.0.x kernel and this is not included in the kernel. So I'm trying to use ethtool modify it from eeprom to see if help or no. Todd, I'll review all MaxPayload for all devices, but need to say if it mismatch, customer could not modify it from BIOS for there was not entry at there, to test it, we have to find how to verify if this is the root cause, so still need to find the offset in eeprom. Thanks in advance, Joe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
On 11/28/12 02:10, Ben Hutchings wrote: On Tue, 2012-11-27 at 17:32 +, Fujinaka, Todd wrote: Forgive me if I'm being too repetitious as I think some of this has been mentioned in the past. We (and by we I mean the Ethernet part and driver) can only change the advertised availability of a larger MaxPayloadSize. The size is negotiated by both sides of the link when the link is established. The driver should not change the size of the link as it would be poking at registers outside of its scope and is controlled by the upstream bridge (not us). [...] MaxPayloadSize (MPS) is not negotiated between devices but is programmed by the system firmware (at least for devices present at boot - the kernel may be responsible in case of hotplug). You can use the kernel parameter 'pci=pcie_bus_perf' (or one of several others) to set a policy that overrides this, but no policy will allow setting MPS above the device's MaxPayloadSizeSupported (MPSS). Ben, Unfortunately I'm using 3.0.x kernel and this is not included in the kernel. So I'm trying to use ethtool modify it from eeprom to see if help or no. Todd, I'll review all MaxPayload for all devices, but need to say if it mismatch, customer could not modify it from BIOS for there was not entry at there, to test it, we have to find how to verify if this is the root cause, so still need to find the offset in eeprom. Thanks in advance, Joe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
On 11/27/12 00:23, Fujinaka, Todd wrote: > If you look at the previous section, DevCap, you'll see that it's > correctly advertising 256 bytes but the system is negotiating 128 for > the link to the Ethernet controller. Things on the "other" side of the > link are controlled outside of the e1000 driver. > > Tushar's first suggestion was to check the PCIe payload settings in the > entire chain. Have you done that? Mismatches will cause hangs. Hi Todd, So far I had to know how to modify the maxpayload size, since BIOS have not entry to change this, so I had to use ethtool, now I need to get the offset of MaxPayload size in eeprom, I ever tried to find from Intel online document but failed, any idea? Thanks in advance, Joe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
On 11/27/12 00:23, Fujinaka, Todd wrote: If you look at the previous section, DevCap, you'll see that it's correctly advertising 256 bytes but the system is negotiating 128 for the link to the Ethernet controller. Things on the other side of the link are controlled outside of the e1000 driver. Tushar's first suggestion was to check the PCIe payload settings in the entire chain. Have you done that? Mismatches will cause hangs. Hi Todd, So far I had to know how to modify the maxpayload size, since BIOS have not entry to change this, so I had to use ethtool, now I need to get the offset of MaxPayload size in eeprom, I ever tried to find from Intel online document but failed, any idea? Thanks in advance, Joe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 82571EB: Detected Hardware Unit Hang
On 11/20/12 16:59, Dave, Tushar N wrote: > Have you power off the system completely after modifying eeprom? If not > please do so. Hi Tushar, Seems not works for me, would you please help to check what is wrong of my operations? Original eeprom dump: # ethtool -e eth3 | head -8 Offset Values -- -- 0x 00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 0x0010 57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 0x0020 08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 0x0030 f6 6c b0 37 a6 07 03 84 83 07 00 00 03 c3 02 06 ^ 0x0040 08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 0x0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 # lspci -s :52:00.1 -vvv 52:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) <--snip--> Capabilities: [e0] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset- DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 4096 bytes ^ <--snip--> # ethtool eth3 Settings for eth3: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised pause frame use: No Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on MDI-X: off Supports Wake-on: d Wake-on: d Current message level: 0x0007 (7) Link detected: yes # ethtool -E eth3 magic 0x10a48086 offset 0x34 value 0xa7 # ethtool -e eth3 | head -8 Offset Values -- -- 0x 00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 0x0010 57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 0x0020 08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 0x0030 f6 6c b0 37 a7 07 03 84 83 07 00 00 03 c3 02 06 ^ <== a6 --> a7 0x0040 08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 0x0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 # reboot # ethtool -e eth3 | head -8 Offset Values -- -- 0x 00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 0x0010 57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 0x0020 08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 0x0030 f6 6c b0 37 a7 07 03 84 83 07 00 00 03 c3 02 06 0x0040 08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 0x0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 # lspci -s :52:00.1 -vvv 52:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) <--snip--> Capabilities: [e0] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset- DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 4096 bytes ^ DevSta: CorrErr- UncorrErr- FatalErr+ UnsuppReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s, Latency L0 <4us, L1 <64us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- <--snip--> # ethtool -E eth3 magic 0x10a48086 offset 0x35 value 0x17 # ethtool -e eth3 | head -8 Offset Values -- -- 0x 00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 0x0010 57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 0x0020 08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 0x0030 f6 6c b0 37 a6 17 03 84 83 07 00 00 03 c3 02 06 ^<== 07 -> 17 0x0040 08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 0x0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 # reboot # ethtool -e eth3 | head -8 Offset Values
Re: 82571EB: Detected Hardware Unit Hang
On 11/20/12 16:59, Dave, Tushar N wrote: > Have you power off the system completely after modifying eeprom? If not > please do so. seems not works for me, would you please help to check what is wrong of my operations? Original eeprom dump: # ethtool -e eth3 | head -8 Offset Values -- -- 0x 00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 0x0010 57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 0x0020 08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 0x0030 f6 6c b0 37 a6 07 03 84 83 07 00 00 03 c3 02 06 ^ 0x0040 08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 0x0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 # lspci -s :52:00.1 -vvv 52:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) <--snip--> Capabilities: [e0] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset- DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 4096 bytes ^ <--snip--> # ethtool eth3 Settings for eth3: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised pause frame use: No Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on MDI-X: off Supports Wake-on: d Wake-on: d Current message level: 0x0007 (7) Link detected: yes # ethtool -E eth3 magic 0x10a48086 offset 0x34 value 0xa7 # ethtool -e eth3 | head -8 Offset Values -- -- 0x 00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 0x0010 57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 0x0020 08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 0x0030 f6 6c b0 37 a7 07 03 84 83 07 00 00 03 c3 02 06 ^ <== a6 --> a7 0x0040 08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 0x0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 # reboot # ethtool -e eth3 | head -8 Offset Values -- -- 0x 00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 0x0010 57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 0x0020 08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 0x0030 f6 6c b0 37 a7 07 03 84 83 07 00 00 03 c3 02 06 0x0040 08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 0x0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 # lspci -s :52:00.1 -vvv 52:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) <--snip--> Capabilities: [e0] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset- DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 4096 bytes ^ DevSta: CorrErr- UncorrErr- FatalErr+ UnsuppReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s, Latency L0 <4us, L1 <64us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- <--snip--> # ethtool -E eth3 magic 0x10a48086 offset 0x35 value 0x17 # ethtool -e eth3 | head -8 Offset Values -- -- 0x 00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 0x0010 57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 0x0020 08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 0x0030 f6 6c b0 37 a6 17 03 84 83 07 00 00 03 c3 02 06 ^<== 07 -> 17 0x0040 08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 0x0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 # reboot # ethtool -e eth3 | head -8 Offset Values --
Re: 82571EB: Detected Hardware Unit Hang
On 11/20/12 16:59, Dave, Tushar N wrote: Have you power off the system completely after modifying eeprom? If not please do so. seems not works for me, would you please help to check what is wrong of my operations? Original eeprom dump: # ethtool -e eth3 | head -8 Offset Values -- -- 0x 00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 0x0010 57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 0x0020 08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 0x0030 f6 6c b0 37 a6 07 03 84 83 07 00 00 03 c3 02 06 ^ 0x0040 08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 0x0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 # lspci -s :52:00.1 -vvv 52:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) --snip-- Capabilities: [e0] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s 512ns, L1 64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset- DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 4096 bytes ^ --snip-- # ethtool eth3 Settings for eth3: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised pause frame use: No Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on MDI-X: off Supports Wake-on: d Wake-on: d Current message level: 0x0007 (7) Link detected: yes # ethtool -E eth3 magic 0x10a48086 offset 0x34 value 0xa7 # ethtool -e eth3 | head -8 Offset Values -- -- 0x 00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 0x0010 57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 0x0020 08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 0x0030 f6 6c b0 37 a7 07 03 84 83 07 00 00 03 c3 02 06 ^ == a6 -- a7 0x0040 08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 0x0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 # reboot # ethtool -e eth3 | head -8 Offset Values -- -- 0x 00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 0x0010 57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 0x0020 08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 0x0030 f6 6c b0 37 a7 07 03 84 83 07 00 00 03 c3 02 06 0x0040 08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 0x0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 # lspci -s :52:00.1 -vvv 52:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) --snip-- Capabilities: [e0] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s 512ns, L1 64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset- DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 4096 bytes ^ DevSta: CorrErr- UncorrErr- FatalErr+ UnsuppReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s, Latency L0 4us, L1 64us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- --snip-- # ethtool -E eth3 magic 0x10a48086 offset 0x35 value 0x17 # ethtool -e eth3 | head -8 Offset Values -- -- 0x 00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 0x0010 57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 0x0020 08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 0x0030 f6 6c b0 37 a6 17 03 84 83 07 00 00 03 c3 02 06 ^== 07 - 17 0x0040 08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 0x0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 # reboot # ethtool -e eth3 | head -8 Offset Values -- -- 0x
Re: 82571EB: Detected Hardware Unit Hang
On 11/20/12 16:59, Dave, Tushar N wrote: Have you power off the system completely after modifying eeprom? If not please do so. Hi Tushar, Seems not works for me, would you please help to check what is wrong of my operations? Original eeprom dump: # ethtool -e eth3 | head -8 Offset Values -- -- 0x 00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 0x0010 57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 0x0020 08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 0x0030 f6 6c b0 37 a6 07 03 84 83 07 00 00 03 c3 02 06 ^ 0x0040 08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 0x0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 # lspci -s :52:00.1 -vvv 52:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) --snip-- Capabilities: [e0] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s 512ns, L1 64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset- DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 4096 bytes ^ --snip-- # ethtool eth3 Settings for eth3: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised pause frame use: No Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on MDI-X: off Supports Wake-on: d Wake-on: d Current message level: 0x0007 (7) Link detected: yes # ethtool -E eth3 magic 0x10a48086 offset 0x34 value 0xa7 # ethtool -e eth3 | head -8 Offset Values -- -- 0x 00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 0x0010 57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 0x0020 08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 0x0030 f6 6c b0 37 a7 07 03 84 83 07 00 00 03 c3 02 06 ^ == a6 -- a7 0x0040 08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 0x0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 # reboot # ethtool -e eth3 | head -8 Offset Values -- -- 0x 00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 0x0010 57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 0x0020 08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 0x0030 f6 6c b0 37 a7 07 03 84 83 07 00 00 03 c3 02 06 0x0040 08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 0x0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 # lspci -s :52:00.1 -vvv 52:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) --snip-- Capabilities: [e0] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s 512ns, L1 64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset- DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 4096 bytes ^ DevSta: CorrErr- UncorrErr- FatalErr+ UnsuppReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s, Latency L0 4us, L1 64us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- --snip-- # ethtool -E eth3 magic 0x10a48086 offset 0x35 value 0x17 # ethtool -e eth3 | head -8 Offset Values -- -- 0x 00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 0x0010 57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 0x0020 08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 0x0030 f6 6c b0 37 a6 17 03 84 83 07 00 00 03 c3 02 06 ^== 07 - 17 0x0040 08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 0x0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 # reboot # ethtool -e eth3 | head -8 Offset Values -- --
Re: 82571EB: Detected Hardware Unit Hang
On 11/16/12 04:26, Dave, Tushar N wrote: >> Would you please help to fine the offset of max payload size in eeprom? >> I'd like to have a try to modify it by ethtool. > > It is defined using bit 8 of word 0x1A. > Bit value 0 = 128B , bit value 1 = 256B Hi Tushar, I checked one of my server which Max Payload Size is 128: # lspci -vvv -s 52:00.1 52:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) Subsystem: Intel Corporation PRO/1000 PT Quad Port Server Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 82571EB: Detected Hardware Unit Hang
On 11/16/12 04:26, Dave, Tushar N wrote: Would you please help to fine the offset of max payload size in eeprom? I'd like to have a try to modify it by ethtool. It is defined using bit 8 of word 0x1A. Bit value 0 = 128B , bit value 1 = 256B Hi Tushar, I checked one of my server which Max Payload Size is 128: # lspci -vvv -s 52:00.1 52:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) Subsystem: Intel Corporation PRO/1000 PT Quad Port Server Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- MAbort- SERR- PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin B routed to IRQ 266 Region 0: Memory at dfea (32-bit, non-prefetchable) [size=128K] Region 1: Memory at dfe8 (32-bit, non-prefetchable) [size=128K] Region 2: I/O ports at 6020 [size=32] [virtual] Expansion ROM at d812 [disabled] [size=128K] Capabilities: [c8] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: fee0 Data: 409a Capabilities: [e0] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s 512ns, L1 64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset- DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 4096 bytes DevSta: CorrErr- UncorrErr- FatalErr+ UnsuppReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s, Latency L0 4us, L1 64us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol- UESvrt: DLP+ SDES- TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt+ RxOF+ MalfTLP+ ECRC- UnsupReq+ ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr- AERCap: First Error Pointer: 14, GenCap- CGenEn- ChkCap- ChkEn- Capabilities: [140 v1] Device Serial Number 00-15-17-ff-ff-16-ed-86 Kernel driver in use: e1000e Kernel modules: e1000e And eeprom dump as below: Offset Values -- -- 0x 00 15 17 16 ed 86 24 05 ff ff a2 50 ff ff ff ff 0x0010 57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 0x0020 08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 0x0030 f6 6c b0 37 a6 07 03 84 83 07 00 00 03 c3 02 06 0x0040 08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 0x0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0060 00 01 00 40 1e 12 07 40 00 01 00 40 ff ff ff ff If I did not misunderstand, the value of offset 0x1a is 0x07a6, then the bit 8 is 1, but my NIC's MPS is 128b, anything I'm wrong? Thanks, Joe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 82571EB: Detected Hardware Unit Hang
On 11/14/12 11:45, Dave, Tushar N wrote: >> -Original Message- >> From: Joe Jin [mailto:joe@oracle.com] >> Sent: Tuesday, November 13, 2012 6:48 PM >> To: Dave, Tushar N >> Cc: e1000-de...@lists.sf.net; net...@vger.kernel.org; linux- >> ker...@vger.kernel.org; Mary Mcgrath >> Subject: Re: 82571EB: Detected Hardware Unit Hang >> >> On 11/09/12 04:35, Dave, Tushar N wrote: >>> All devices in path from root complex to 82571, should have *same* max >> payload size otherwise it can cause hang. >>> Can you double check this? >> >> Hi Tushar, >> >> Checked with hardware vendor and they said no way to modify the max >> payload size from BIOS, can I modify it from driver side? > > If you want to change value for 82571 device you can do it from eeprom but > for other upstream devices I am not sure. I will check with my team. Hi Tushar, Would you please help to fine the offset of max payload size in eeprom? I'd like to have a try to modify it by ethtool. Thanks in advance, Joe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 82571EB: Detected Hardware Unit Hang
On 11/14/12 11:45, Dave, Tushar N wrote: -Original Message- From: Joe Jin [mailto:joe@oracle.com] Sent: Tuesday, November 13, 2012 6:48 PM To: Dave, Tushar N Cc: e1000-de...@lists.sf.net; net...@vger.kernel.org; linux- ker...@vger.kernel.org; Mary Mcgrath Subject: Re: 82571EB: Detected Hardware Unit Hang On 11/09/12 04:35, Dave, Tushar N wrote: All devices in path from root complex to 82571, should have *same* max payload size otherwise it can cause hang. Can you double check this? Hi Tushar, Checked with hardware vendor and they said no way to modify the max payload size from BIOS, can I modify it from driver side? If you want to change value for 82571 device you can do it from eeprom but for other upstream devices I am not sure. I will check with my team. Hi Tushar, Would you please help to fine the offset of max payload size in eeprom? I'd like to have a try to modify it by ethtool. Thanks in advance, Joe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 82571EB: Detected Hardware Unit Hang
On 11/09/12 04:35, Dave, Tushar N wrote: > All devices in path from root complex to 82571, should have *same* max > payload size otherwise it can cause hang. > Can you double check this? Hi Tushar, Checked with hardware vendor and they said no way to modify the max payload size from BIOS, can I modify it from driver side? Thanks, Joe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 82571EB: Detected Hardware Unit Hang
On 11/09/12 04:35, Dave, Tushar N wrote: All devices in path from root complex to 82571, should have *same* max payload size otherwise it can cause hang. Can you double check this? Hi Tushar, Checked with hardware vendor and they said no way to modify the max payload size from BIOS, can I modify it from driver side? Thanks, Joe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 82571EB: Detected Hardware Unit Hang
On 11/09/12 04:35, Dave, Tushar N wrote: > Are you sure this is not similar issue as before that you reported. > i.e. Tushar, Thanks for your quick response, I'll check with customer if they can modify the Max payload size from BIOS, this time issue hit on HP's server. Thanks again, Joe > On Mon, 2012-07-09 at 16:51 +0800, Joe Jin wrote: >> > I'm seeing a Unit Hang even with the latest e1000e driver 2.0.0 when >> > doing scp test. this issue is easy do reproduced on SUN FIRE X2270 M2, >> > just copy a big file (>500M) from another server will hit it at once. > All devices in path from root complex to 82571, should have *same* max > payload size otherwise it can cause hang. > Can you double check this? > -- Oracle <http://www.oracle.com> Joe Jin | Software Development Senior Manager | +8610.6106.5624 ORACLE | Linux and Virtualization No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 82571EB: Detected Hardware Unit Hang
On 11/09/12 04:35, Dave, Tushar N wrote: Are you sure this is not similar issue as before that you reported. i.e. Tushar, Thanks for your quick response, I'll check with customer if they can modify the Max payload size from BIOS, this time issue hit on HP's server. Thanks again, Joe On Mon, 2012-07-09 at 16:51 +0800, Joe Jin wrote: I'm seeing a Unit Hang even with the latest e1000e driver 2.0.0 when doing scp test. this issue is easy do reproduced on SUN FIRE X2270 M2, just copy a big file (500M) from another server will hit it at once. All devices in path from root complex to 82571, should have *same* max payload size otherwise it can cause hang. Can you double check this? -- Oracle http://www.oracle.com Joe Jin | Software Development Senior Manager | +8610.6106.5624 ORACLE | Linux and Virtualization No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
82571EB: Detected Hardware Unit Hang
Hi list, IHAC reported "82571EB Detected Hardware Unit Hang" on HP ProLiant DL360 G6, and have to reboot the server to recover: e1000e :06:00.1: eth3: Detected Hardware Unit Hang: TDH <1a> TDT <1a> next_to_use <1a> next_to_clean<18> buffer_info[next_to_clean]: time_stamp <10047a74e> next_to_watch<18> jiffies <10047a88c> next_to_watch.status <1> MAC Status <80383> PHY Status <792d> PHY 1000BASE-T Status <3800> PHY Extended Status<3000> PCI Status <10> With newer kernel 2.0.0.1 the issue still reproducible. Device info: 06:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) (rev 06) 06:00.1 0200: 8086:10bc (rev 06) I compared lspci output before and after the issue, different as below: 06:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) (rev 06) Subsystem: Hewlett-Packard Company NC364T PCI Express Quad Port Gigabit Server Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B- DisINTx- - Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- http://www.oracle.com> Joe Jin | Software Development Senior Manager | +8610.6106.5624 ORACLE | Linux and Virtualization No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
82571EB: Detected Hardware Unit Hang
Hi list, IHAC reported 82571EB Detected Hardware Unit Hang on HP ProLiant DL360 G6, and have to reboot the server to recover: e1000e :06:00.1: eth3: Detected Hardware Unit Hang: TDH 1a TDT 1a next_to_use 1a next_to_clean18 buffer_info[next_to_clean]: time_stamp 10047a74e next_to_watch18 jiffies 10047a88c next_to_watch.status 1 MAC Status 80383 PHY Status 792d PHY 1000BASE-T Status 3800 PHY Extended Status3000 PCI Status 10 With newer kernel 2.0.0.1 the issue still reproducible. Device info: 06:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) (rev 06) 06:00.1 0200: 8086:10bc (rev 06) I compared lspci output before and after the issue, different as below: 06:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) (rev 06) Subsystem: Hewlett-Packard Company NC364T PCI Express Quad Port Gigabit Server Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B- DisINTx- - Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- MAbort- SERR- PERR- INTx- + Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- MAbort- SERR- PERR- INTx+ Would you please help to it? Thanks in advance, Joe -- Oracle http://www.oracle.com Joe Jin | Software Development Senior Manager | +8610.6106.5624 ORACLE | Linux and Virtualization No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] qla3xxx: Ensure request/response queue addr writes to the registers
Before use the request and response queue addr, make sure it has wrote to the registers. Signed-off-by: Joe Jin Cc: Jitendra Kalsaria Cc: Ron Mercer --- drivers/net/ethernet/qlogic/qla3xxx.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c b/drivers/net/ethernet/qlogic/qla3xxx.c index df09b1c..6407d0d 100644 --- a/drivers/net/ethernet/qlogic/qla3xxx.c +++ b/drivers/net/ethernet/qlogic/qla3xxx.c @@ -2525,6 +2525,13 @@ static int ql_alloc_net_req_rsp_queues(struct ql3_adapter *qdev) qdev->req_q_size = (u32) (NUM_REQ_Q_ENTRIES * sizeof(struct ob_mac_iocb_req)); + qdev->rsp_q_size = NUM_RSP_Q_ENTRIES * sizeof(struct net_rsp_iocb); + + /* The barrier is required to ensure request and response queue +* addr writes to the registers. +*/ + wmb(); + qdev->req_q_virt_addr = pci_alloc_consistent(qdev->pdev, (size_t) qdev->req_q_size, @@ -2536,8 +2543,6 @@ static int ql_alloc_net_req_rsp_queues(struct ql3_adapter *qdev) return -ENOMEM; } - qdev->rsp_q_size = NUM_RSP_Q_ENTRIES * sizeof(struct net_rsp_iocb); - qdev->rsp_q_virt_addr = pci_alloc_consistent(qdev->pdev, (size_t) qdev->rsp_q_size, -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] qla3xxx: Ensure request/response queue addr writes to the registers
Before use the request and response queue addr, make sure it has wrote to the registers. Signed-off-by: Joe Jin joe@oracle.com Cc: Jitendra Kalsaria jitendra.kalsa...@qlogic.com Cc: Ron Mercer ron.mer...@qlogic.com --- drivers/net/ethernet/qlogic/qla3xxx.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c b/drivers/net/ethernet/qlogic/qla3xxx.c index df09b1c..6407d0d 100644 --- a/drivers/net/ethernet/qlogic/qla3xxx.c +++ b/drivers/net/ethernet/qlogic/qla3xxx.c @@ -2525,6 +2525,13 @@ static int ql_alloc_net_req_rsp_queues(struct ql3_adapter *qdev) qdev-req_q_size = (u32) (NUM_REQ_Q_ENTRIES * sizeof(struct ob_mac_iocb_req)); + qdev-rsp_q_size = NUM_RSP_Q_ENTRIES * sizeof(struct net_rsp_iocb); + + /* The barrier is required to ensure request and response queue +* addr writes to the registers. +*/ + wmb(); + qdev-req_q_virt_addr = pci_alloc_consistent(qdev-pdev, (size_t) qdev-req_q_size, @@ -2536,8 +2543,6 @@ static int ql_alloc_net_req_rsp_queues(struct ql3_adapter *qdev) return -ENOMEM; } - qdev-rsp_q_size = NUM_RSP_Q_ENTRIES * sizeof(struct net_rsp_iocb); - qdev-rsp_q_virt_addr = pci_alloc_consistent(qdev-pdev, (size_t) qdev-rsp_q_size, -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] qla3xxx: Ensure request/response queue addr writes to the registers
Before use the request and response queue addr, make sure it has wrote to the registers. Signed-off-by: Joe Jin Cc: Jitendra Kalsaria Cc: Ron Mercer --- drivers/net/ethernet/qlogic/qla3xxx.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c b/drivers/net/ethernet/qlogic/qla3xxx.c index df09b1c..f745ade 100644 --- a/drivers/net/ethernet/qlogic/qla3xxx.c +++ b/drivers/net/ethernet/qlogic/qla3xxx.c @@ -2525,6 +2525,12 @@ static int ql_alloc_net_req_rsp_queues(struct ql3_adapter *qdev) qdev->req_q_size = (u32) (NUM_REQ_Q_ENTRIES * sizeof(struct ob_mac_iocb_req)); + /* +* The barrier is required to ensure request and response queue +* addr writes to the registers. +*/ + wmb(); + qdev->req_q_virt_addr = pci_alloc_consistent(qdev->pdev, (size_t) qdev->req_q_size, -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] qla3xxx: Ensure request/response queue addr writes to the registers
Before use the request and response queue addr, make sure it has wrote to the registers. Signed-off-by: Joe Jin Cc: Jitendra Kalsaria Cc: Ron Mercer --- drivers/net/ethernet/qlogic/qla3xxx.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c b/drivers/net/ethernet/qlogic/qla3xxx.c index df09b1c..f745ade 100644 --- a/drivers/net/ethernet/qlogic/qla3xxx.c +++ b/drivers/net/ethernet/qlogic/qla3xxx.c @@ -2525,6 +2525,12 @@ static int ql_alloc_net_req_rsp_queues(struct ql3_adapter *qdev) qdev->req_q_size = (u32) (NUM_REQ_Q_ENTRIES * sizeof(struct ob_mac_iocb_req)); + /* +* The barrier is required to ensure request and response queue +* addr writes to the registers. +*/ + wmb(); + qdev->req_q_virt_addr = pci_alloc_consistent(qdev->pdev, (size_t) qdev->req_q_size, -- 1.7.11.7
[PATCH] qla3xxx: Ensure request/response queue addr writes to the registers
Before use the request and response queue addr, make sure it has wrote to the registers. Signed-off-by: Joe Jin joe@oracle.com Cc: Jitendra Kalsaria jitendra.kalsa...@qlogic.com Cc: Ron Mercer ron.mer...@qlogic.com --- drivers/net/ethernet/qlogic/qla3xxx.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c b/drivers/net/ethernet/qlogic/qla3xxx.c index df09b1c..f745ade 100644 --- a/drivers/net/ethernet/qlogic/qla3xxx.c +++ b/drivers/net/ethernet/qlogic/qla3xxx.c @@ -2525,6 +2525,12 @@ static int ql_alloc_net_req_rsp_queues(struct ql3_adapter *qdev) qdev-req_q_size = (u32) (NUM_REQ_Q_ENTRIES * sizeof(struct ob_mac_iocb_req)); + /* +* The barrier is required to ensure request and response queue +* addr writes to the registers. +*/ + wmb(); + qdev-req_q_virt_addr = pci_alloc_consistent(qdev-pdev, (size_t) qdev-req_q_size, -- 1.7.11.7
[PATCH] qla3xxx: Ensure request/response queue addr writes to the registers
Before use the request and response queue addr, make sure it has wrote to the registers. Signed-off-by: Joe Jin joe@oracle.com Cc: Jitendra Kalsaria jitendra.kalsa...@qlogic.com Cc: Ron Mercer ron.mer...@qlogic.com --- drivers/net/ethernet/qlogic/qla3xxx.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c b/drivers/net/ethernet/qlogic/qla3xxx.c index df09b1c..f745ade 100644 --- a/drivers/net/ethernet/qlogic/qla3xxx.c +++ b/drivers/net/ethernet/qlogic/qla3xxx.c @@ -2525,6 +2525,12 @@ static int ql_alloc_net_req_rsp_queues(struct ql3_adapter *qdev) qdev-req_q_size = (u32) (NUM_REQ_Q_ENTRIES * sizeof(struct ob_mac_iocb_req)); + /* +* The barrier is required to ensure request and response queue +* addr writes to the registers. +*/ + wmb(); + qdev-req_q_virt_addr = pci_alloc_consistent(qdev-pdev, (size_t) qdev-req_q_size, -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] qla3xxx: Ensure req_q_phy_addr writes to the register
On 10/18/12 01:45, Jitendra Kalsaria wrote: > > >> -Original Message----- >> From: Joe Jin [mailto:joe@oracle.com] >> Sent: Tuesday, October 16, 2012 11:32 PM >> To: Ron Mercer; Jitendra Kalsaria; Dept-Eng Linux Driver >> Cc: netdev; linux-kernel; Greg Marsden >> Subject: [PATCH] qla3xxx: Ensure req_q_phy_addr writes to the register >> >> Make sure req_q_phy_addr write to the register. >> >> Signed-off-by: Joe Jin >> Cc: Ron Mercer >> Cc: Jitendra Kalsaria >> --- >> drivers/net/ethernet/qlogic/qla3xxx.c | 6 ++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c >> b/drivers/net/ethernet/qlogic/qla3xxx.c >> index df09b1c..78b4cba 100644 >> --- a/drivers/net/ethernet/qlogic/qla3xxx.c >> +++ b/drivers/net/ethernet/qlogic/qla3xxx.c >> @@ -2525,6 +2525,12 @@ static int ql_alloc_net_req_rsp_queues(struct >> ql3_adapter *qdev) >> qdev->req_q_size = >> (u32) (NUM_REQ_Q_ENTRIES * sizeof(struct ob_mac_iocb_req)); >> >> +/* >> + * The barrier is required to ensure that req_q_phy_addr writes to >> + * the memory. >> + */ >> +wmb(); >> + >> qdev->req_q_virt_addr = >> pci_alloc_consistent(qdev->pdev, >> (size_t) qdev->req_q_size, > > Your changes only take care of request queue but not response queue which > also need barrier. Jiten, Thanks for review! The barrier to make sure writel() call for req_q_phy_addr and rsp_q_phy_addr in ql_adapter_initialize(), so I think call once wmb() is enough but I need to update the comment, any idea? Thanks, Joe > > qdev->req_q_size = > (u32) (NUM_REQ_Q_ENTRIES * sizeof(struct ob_mac_iocb_req)); > > qdev->rsp_q_size = NUM_RSP_Q_ENTRIES * sizeof(struct net_rsp_iocb); > > wmb(); > > thanks, > Jiten > -- Oracle <http://www.oracle.com> Joe Jin | Software Development Senior Manager | +8610.6106.5624 ORACLE | Linux and Virtualization No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] qla3xxx: Ensure req_q_phy_addr writes to the register
Make sure req_q_phy_addr write to the register. Signed-off-by: Joe Jin Cc: Ron Mercer Cc: Jitendra Kalsaria --- drivers/net/ethernet/qlogic/qla3xxx.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c b/drivers/net/ethernet/qlogic/qla3xxx.c index df09b1c..78b4cba 100644 --- a/drivers/net/ethernet/qlogic/qla3xxx.c +++ b/drivers/net/ethernet/qlogic/qla3xxx.c @@ -2525,6 +2525,12 @@ static int ql_alloc_net_req_rsp_queues(struct ql3_adapter *qdev) qdev->req_q_size = (u32) (NUM_REQ_Q_ENTRIES * sizeof(struct ob_mac_iocb_req)); + /* +* The barrier is required to ensure that req_q_phy_addr writes to +* the memory. +*/ + wmb(); + qdev->req_q_virt_addr = pci_alloc_consistent(qdev->pdev, (size_t) qdev->req_q_size, -- 1.7.11.7
[PATCH] qla3xxx: Ensure req_q_phy_addr writes to the register
Make sure req_q_phy_addr write to the register. Signed-off-by: Joe Jin joe@oracle.com Cc: Ron Mercer ron.mer...@qlogic.com Cc: Jitendra Kalsaria jitendra.kalsa...@qlogic.com --- drivers/net/ethernet/qlogic/qla3xxx.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c b/drivers/net/ethernet/qlogic/qla3xxx.c index df09b1c..78b4cba 100644 --- a/drivers/net/ethernet/qlogic/qla3xxx.c +++ b/drivers/net/ethernet/qlogic/qla3xxx.c @@ -2525,6 +2525,12 @@ static int ql_alloc_net_req_rsp_queues(struct ql3_adapter *qdev) qdev-req_q_size = (u32) (NUM_REQ_Q_ENTRIES * sizeof(struct ob_mac_iocb_req)); + /* +* The barrier is required to ensure that req_q_phy_addr writes to +* the memory. +*/ + wmb(); + qdev-req_q_virt_addr = pci_alloc_consistent(qdev-pdev, (size_t) qdev-req_q_size, -- 1.7.11.7
Re: [PATCH] qla3xxx: Ensure req_q_phy_addr writes to the register
On 10/18/12 01:45, Jitendra Kalsaria wrote: -Original Message- From: Joe Jin [mailto:joe@oracle.com] Sent: Tuesday, October 16, 2012 11:32 PM To: Ron Mercer; Jitendra Kalsaria; Dept-Eng Linux Driver Cc: netdev; linux-kernel; Greg Marsden Subject: [PATCH] qla3xxx: Ensure req_q_phy_addr writes to the register Make sure req_q_phy_addr write to the register. Signed-off-by: Joe Jin joe@oracle.com Cc: Ron Mercer ron.mer...@qlogic.com Cc: Jitendra Kalsaria jitendra.kalsa...@qlogic.com --- drivers/net/ethernet/qlogic/qla3xxx.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c b/drivers/net/ethernet/qlogic/qla3xxx.c index df09b1c..78b4cba 100644 --- a/drivers/net/ethernet/qlogic/qla3xxx.c +++ b/drivers/net/ethernet/qlogic/qla3xxx.c @@ -2525,6 +2525,12 @@ static int ql_alloc_net_req_rsp_queues(struct ql3_adapter *qdev) qdev-req_q_size = (u32) (NUM_REQ_Q_ENTRIES * sizeof(struct ob_mac_iocb_req)); +/* + * The barrier is required to ensure that req_q_phy_addr writes to + * the memory. + */ +wmb(); + qdev-req_q_virt_addr = pci_alloc_consistent(qdev-pdev, (size_t) qdev-req_q_size, Your changes only take care of request queue but not response queue which also need barrier. Jiten, Thanks for review! The barrier to make sure writel() call for req_q_phy_addr and rsp_q_phy_addr in ql_adapter_initialize(), so I think call once wmb() is enough but I need to update the comment, any idea? Thanks, Joe qdev-req_q_size = (u32) (NUM_REQ_Q_ENTRIES * sizeof(struct ob_mac_iocb_req)); qdev-rsp_q_size = NUM_RSP_Q_ENTRIES * sizeof(struct net_rsp_iocb); wmb(); thanks, Jiten -- Oracle http://www.oracle.com Joe Jin | Software Development Senior Manager | +8610.6106.5624 ORACLE | Linux and Virtualization No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 82571EB: Detected Hardware Unit Hang
On 07/15/12 11:42, Dave, Tushar N wrote: >> -Original Message- >> From: Joe Jin [mailto:joe@oracle.com] >> Sent: Thursday, July 12, 2012 9:34 PM >> To: Dave, Tushar N >> Cc: e1000-de...@lists.sf.net; net...@vger.kernel.org; linux- >> ker...@vger.kernel.org >> Subject: Re: 82571EB: Detected Hardware Unit Hang >> >> On 07/13/12 12:10, Dave, Tushar N wrote: >>>> -Original Message- >>>> From: Joe Jin [mailto:joe@oracle.com] >>>> Sent: Thursday, July 12, 2012 4:46 PM >>>> To: Dave, Tushar N >>>> Cc: e1000-de...@lists.sf.net; net...@vger.kernel.org; linux- >>>> ker...@vger.kernel.org >>>> Subject: Re: 82571EB: Detected Hardware Unit Hang >>>> >>> Thanks for sending full dmesg log. I am still investigating. I think >> this issue can occur if two PCIe link partner *i.e pcie bridge and pcie >> device do not have same max payload size. >>> I need 2 more info. >>> 1) PBA number of the card. >> >> This is a remote server and I could not get this. >> >>> 2) full lspci -vvv output of entire system 'after you have changed max >> payload size to 128'. > > Somehow setting max payload to 256 from BIOS does not set this value for all > devices. I believe this is a BIOS bug. > All devices in path from root complex to 82571, should have same max payload > size otherwise it can cause hang. When you set max payload to 128 from BIOS, > all device in path from root complex to 82571 got assigned same max payload > size. This resolves the issue. > > I hope this helps. Tushar, Thanks a lot for your help, will send this to hardware engineer. Regards, Joe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 82571EB: Detected Hardware Unit Hang
On 07/15/12 11:42, Dave, Tushar N wrote: -Original Message- From: Joe Jin [mailto:joe@oracle.com] Sent: Thursday, July 12, 2012 9:34 PM To: Dave, Tushar N Cc: e1000-de...@lists.sf.net; net...@vger.kernel.org; linux- ker...@vger.kernel.org Subject: Re: 82571EB: Detected Hardware Unit Hang On 07/13/12 12:10, Dave, Tushar N wrote: -Original Message- From: Joe Jin [mailto:joe@oracle.com] Sent: Thursday, July 12, 2012 4:46 PM To: Dave, Tushar N Cc: e1000-de...@lists.sf.net; net...@vger.kernel.org; linux- ker...@vger.kernel.org Subject: Re: 82571EB: Detected Hardware Unit Hang Thanks for sending full dmesg log. I am still investigating. I think this issue can occur if two PCIe link partner *i.e pcie bridge and pcie device do not have same max payload size. I need 2 more info. 1) PBA number of the card. This is a remote server and I could not get this. 2) full lspci -vvv output of entire system 'after you have changed max payload size to 128'. Somehow setting max payload to 256 from BIOS does not set this value for all devices. I believe this is a BIOS bug. All devices in path from root complex to 82571, should have same max payload size otherwise it can cause hang. When you set max payload to 128 from BIOS, all device in path from root complex to 82571 got assigned same max payload size. This resolves the issue. I hope this helps. Tushar, Thanks a lot for your help, will send this to hardware engineer. Regards, Joe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 82571EB: Detected Hardware Unit Hang
On 07/12/12 13:57, Dave, Tushar N wrote: >> -Original Message- >> From: Joe Jin [mailto:joe@oracle.com] >> Sent: Wednesday, July 11, 2012 8:13 PM >> To: Dave, Tushar N >> Cc: e1000-de...@lists.sf.net; net...@vger.kernel.org; linux- >> ker...@vger.kernel.org >> Subject: Re: 82571EB: Detected Hardware Unit Hang >> >> On 07/12/12 11:07, Dave, Tushar N wrote: >>>> -Original Message- >>>> From: Joe Jin [mailto:joe@oracle.com] >>>> Sent: Wednesday, July 11, 2012 7:58 PM >>>> To: Dave, Tushar N >>>> Cc: e1000-de...@lists.sf.net; net...@vger.kernel.org; linux- >>>> ker...@vger.kernel.org >>>> Subject: Re: 82571EB: Detected Hardware Unit Hang >>>> >>>> On 07/12/12 10:52, Dave, Tushar N wrote: >>>>> What is the exact error messages in BIOS log? >>>> >>>> Error message from BIOS event log: >>>> 07/12/12 05:54:00 >>>>PCI Express Non-Fatal Error >>>> >>>> Thanks, >>>> Joe >> Hi Tushar, >> >> Please find eeprom from attachment. > > Do you have lspci -vvv dump of entire system before and after issue occurs? > If you have can you send it to me? > Before: 05:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) (rev 06) Subsystem: Oracle Corporation x4 PCI-Express Quad Gigabit Ethernet UTP Low Profile Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/