Re: [vdpa_sim_net] 79991caf52: net/ipv4/ipmr.c:#RCU-list_traversed_in_non-reader_section

2021-02-08 Thread Joe Jin
On 2/7/21 12:15 PM, Dongli Zhang wrote:
> Is it possible that the issue is not due to this change?

Looks this issue does not related your change, from dmesg output, when issue 
occurred, virtio was not loaded:

[  502.508450] [ cut here ]
[  502.511859] WARNING: CPU: 0 PID: 1 at drivers/gpu/drm/vkms/vkms_crtc.c:21 
vkms_vblank_simulate+0x22a/0x240
[  502.524018] Modules linked in:
[  502.539642] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
5.11.0-rc4-8-g79991caf5202 #1

>
> This change is just to call different API to allocate memory, which is
> equivalent to kzalloc()+vzalloc().
>
> Before the change:
>
> try kzalloc(sizeof(*vs), GFP_KERNEL | __GFP_NOWARN | __GFP_RETRY_MAYFAIL);
>
> ... and then below if the former is failed.
>
> vzalloc(sizeof(*vs));
>
>
> After the change:
>
> try kmalloc_node(size, FP_KERNEL|GFP_ZERO|__GFP_NOWARN|__GFP_NORETRY, node);
>
> ... and then below if the former is failed
>
> __vmalloc_node(size, 1, GFP_KERNEL|GFP_ZERO, node, 
> __builtin_return_address(0));
>
>
> The below is the first WARNING in uploaded dmesg. I assume it was called 
> before
> to open /dev/vhost-scsi.
>
> Will this test try to open /dev/vhost-scsi?
>
> [5.095515] =
> [5.095515] WARNING: suspicious RCU usage
> [5.095515] 5.11.0-rc4-8-g79991caf5202 #1 Not tainted
> [5.095534] -
> [5.096041] security/smack/smack_lsm.c:351 RCU-list traversed in non-reader
> section!!
> [5.096982]
> [5.096982] other info that might help us debug this:
> [5.096982]
> [5.097953]
> [5.097953] rcu_scheduler_active = 1, debug_locks = 1
> [5.098739] no locks held by kthreadd/2.
> [5.099237]
> [5.099237] stack backtrace:
> [5.099537] CPU: 0 PID: 2 Comm: kthreadd Not tainted
> 5.11.0-rc4-8-g79991caf5202 #1
> [5.100470] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> 1.12.0-1 04/01/2014
> [5.101442] Call Trace:
> [5.101807]  dump_stack+0x15f/0x1bf
> [5.102298]  smack_cred_prepare+0x400/0x420
> [5.102840]  ? security_prepare_creds+0xd4/0x120
> [5.103441]  security_prepare_creds+0x84/0x120
> [5.103515]  prepare_creds+0x3f1/0x580
> [5.103515]  copy_creds+0x65/0x480
> [5.103515]  copy_process+0x7b4/0x3600
> [5.103515]  ? check_prev_add+0xa40/0xa40
> [5.103515]  ? lockdep_enabled+0xd/0x60
> [5.103515]  ? lock_is_held_type+0x1a/0x100
> [5.103515]  ? __cleanup_sighand+0xc0/0xc0
> [5.103515]  ? lockdep_unlock+0x39/0x160
> [5.103515]  kernel_clone+0x165/0xd20
> [5.103515]  ? copy_init_mm+0x20/0x20
> [5.103515]  ? pvclock_clocksource_read+0xd9/0x1a0
> [5.103515]  ? sched_clock_local+0x99/0xc0
> [5.103515]  ? kthread_insert_work_sanity_check+0xc0/0xc0
> [5.103515]  kernel_thread+0xba/0x100
> [5.103515]  ? __ia32_sys_clone3+0x40/0x40
> [5.103515]  ? kthread_insert_work_sanity_check+0xc0/0xc0
> [5.103515]  ? do_raw_spin_unlock+0xa9/0x160
> [5.103515]  kthreadd+0x68f/0x7a0
> [5.103515]  ? kthread_create_on_cpu+0x160/0x160
> [5.103515]  ? lockdep_hardirqs_on+0x77/0x100
> [5.103515]  ? _raw_spin_unlock_irq+0x24/0x60
> [5.103515]  ? kthread_create_on_cpu+0x160/0x160
> [5.103515]  ret_from_fork+0x22/0x30
>
> Thank you very much!
>
> Dongli Zhang
>
>
> On 2/6/21 7:03 PM, kernel test robot wrote:
>> Greeting,
>>
>> FYI, we noticed the following commit (built with gcc-9):
>>
>> commit: 79991caf5202c7989928be534727805f8f68bb8d ("vdpa_sim_net: Add support 
>> for user supported devices")
>> https://urldefense.com/v3/__https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git__;!!GqivPVa7Brio!LfgrgVVtPAjwjqTZX8yANgsix4f3cJmAA_CcMeCVymh5XYcamWdR9dnbIQA-p61PJtI$
>>   
>> Dongli-Zhang/vhost-scsi-alloc-vhost_scsi-with-kvzalloc-to-avoid-delay/20210129-191605
>>
>>
>> in testcase: trinity
>> version: trinity-static-x86_64-x86_64-f93256fb_2019-08-28
>> with following parameters:
>>
>>  runtime: 300s
>>
>> test-description: Trinity is a linux system call fuzz tester.
>> test-url: 
>> https://urldefense.com/v3/__http://codemonkey.org.uk/projects/trinity/__;!!GqivPVa7Brio!LfgrgVVtPAjwjqTZX8yANgsix4f3cJmAA_CcMeCVymh5XYcamWdR9dnbIQA-6Y4x88c$
>>  
>>
>>
>> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 8G
>>
>> caused below changes (please refer to attached dmesg/kmsg for entire 
>> log/backtrace):
>>
>>
>> +-+++
>> | | 
>> 39502d042a | 79991caf52 |
>> +-+++
>> | boot_successes  | 
>> 0  | 0  |
>> | boot_failures   | 
>> 62 | 57 |
>> | 

Re: [PATCH v2 1/1] vhost scsi: alloc vhost_scsi with kvzalloc() to avoid delay

2021-02-01 Thread Joe Jin
Can anyone help to review this patch and give a review-by for it please?

Thanks,
Joe
On 1/24/21 7:12 PM, Jason Wang wrote:
>
> On 2021/1/23 下午4:08, Dongli Zhang wrote:
>> The size of 'struct vhost_scsi' is order-10 (~2.3MB). It may take long time
>> delay by kzalloc() to compact memory pages by retrying multiple times when
>> there is a lack of high-order pages. As a result, there is latency to
>> create a VM (with vhost-scsi) or to hotadd vhost-scsi-based storage.
>>
>> The prior commit 595cb754983d ("vhost/scsi: use vmalloc for order-10
>> allocation") prefers to fallback only when really needed, while this patch
>> allocates with kvzalloc() with __GFP_NORETRY implicitly set to avoid
>> retrying memory pages compact for multiple times.
>>
>> The __GFP_NORETRY is implicitly set if the size to allocate is more than
>> PAGE_SZIE and when __GFP_RETRY_MAYFAIL is not explicitly set.
>>
>> Cc: Aruna Ramakrishna 
>> Cc: Joe Jin 
>> Signed-off-by: Dongli Zhang 
>> ---
>> Changed since v1:
>>    - To combine kzalloc() and vzalloc() as kvzalloc()
>>  (suggested by Jason Wang)
>>
>>   drivers/vhost/scsi.c | 9 +++--
>>   1 file changed, 3 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
>> index 4ce9f00ae10e..5de21ad4bd05 100644
>> --- a/drivers/vhost/scsi.c
>> +++ b/drivers/vhost/scsi.c
>> @@ -1814,12 +1814,9 @@ static int vhost_scsi_open(struct inode *inode, 
>> struct file *f)
>>   struct vhost_virtqueue **vqs;
>>   int r = -ENOMEM, i;
>>   -    vs = kzalloc(sizeof(*vs), GFP_KERNEL | __GFP_NOWARN | 
>> __GFP_RETRY_MAYFAIL);
>> -    if (!vs) {
>> -    vs = vzalloc(sizeof(*vs));
>> -    if (!vs)
>> -    goto err_vs;
>> -    }
>> +    vs = kvzalloc(sizeof(*vs), GFP_KERNEL);
>> +    if (!vs)
>> +    goto err_vs;
>>     vqs = kmalloc_array(VHOST_SCSI_MAX_VQ, sizeof(*vqs), GFP_KERNEL);
>>   if (!vqs)
>
>
> Acked-by: Jason Wang 
>
>
>



Re: [PATCH] xen/swiotlb: correct the check for xen_destroy_contiguous_region

2020-04-28 Thread Joe Jin
On 4/28/20 10:25 AM, Konrad Rzeszutek Wilk wrote:
> On Tue, Apr 28, 2020 at 12:19:41PM +0200, Jürgen Groß wrote:
>> On 28.04.20 10:25, Peng Fan wrote:
> 
> Adding Joe Jin.
> 
> Joe, didn't you have some ideas on how this could be implemented?
> 
>>>> Subject: Re: [PATCH] xen/swiotlb: correct the check for
>>>> xen_destroy_contiguous_region
>>>>
>>>> On 28.04.20 09:33, peng@nxp.com wrote:
>>>>> From: Peng Fan 
>>>>>
>>>>> When booting xen on i.MX8QM, met:
>>>>> "
>>>>> [3.602128] Unable to handle kernel paging request at virtual address
>>>> 00272d40
>>>>> [3.610804] Mem abort info:
>>>>> [3.613905]   ESR = 0x9604
>>>>> [3.617332]   EC = 0x25: DABT (current EL), IL = 32 bits
>>>>> [3.623211]   SET = 0, FnV = 0
>>>>> [3.626628]   EA = 0, S1PTW = 0
>>>>> [3.630128] Data abort info:
>>>>> [3.633362]   ISV = 0, ISS = 0x0004
>>>>> [3.637630]   CM = 0, WnR = 0
>>>>> [3.640955] [00272d40] user address but active_mm is
>>>> swapper
>>>>> [3.647983] Internal error: Oops: 9604 [#1] PREEMPT SMP
>>>>> [3.654137] Modules linked in:
>>>>> [3.677285] Hardware name: Freescale i.MX8QM MEK (DT)
>>>>> [3.677302] Workqueue: events deferred_probe_work_func
>>>>> [3.684253] imx6q-pcie 5f00.pcie: PCI host bridge to bus :00
>>>>> [3.688297] pstate: 6005 (nZCv daif -PAN -UAO)
>>>>> [3.688310] pc : xen_swiotlb_free_coherent+0x180/0x1c0
>>>>> [3.693993] pci_bus :00: root bus resource [bus 00-ff]
>>>>> [3.701002] lr : xen_swiotlb_free_coherent+0x44/0x1c0
>>>>> "
>>>>>
>>>>> In xen_swiotlb_alloc_coherent, if !(dev_addr + size - 1 <= dma_mask)
>>>>> or range_straddles_page_boundary(phys, size) are true, it will create
>>>>> contiguous region. So when free, we need to free contiguous region use
>>>>> upper check condition.
>>>>
>>>> No, this will break PV guests on x86.
>>>
>>> Could you share more details why alloc and free not matching for the check?
>>
>> xen_create_contiguous_region() is needed only in case:
>>
>> - the bus address is not within dma_mask, or
>> - the memory region is not physically contiguous (can happen only for
>>   PV guests)
>>
>> In any case it should arrange for the memory to be suitable for the
>> DMA operation, so to be contiguous and within dma_mask afterwards. So
>> xen_destroy_contiguous_region() should only ever called for areas
>> which match above criteria, as otherwise we can be sure
>> xen_create_contiguous_region() was not used for making the area DMA-able
>> in the beginning.

I agreed with Juergen's explanation, That is my understanding.

Peng, if panic caused by (dev_addr + size - 1 > dma_mask), you should check
how you get the addr, if memory created by xen_create_contiguous_region(),
memory must be with in [0 - dma_mask].

Thanks,
Joe

>>
>> And this is very important in the PV case, as in those guests the page
>> tables are containing the host-PFNs, not the guest-PFNS, and
>> xen_create_contiguous_region() will fiddle with host- vs. guest-PFN
>> arrangements, and xen_destroy_contiguous_region() is reverting this
>> fiddling. Any call of xen_destroy_contiguous_region() for an area it
>> was not intended to be called for might swap physical pages beneath
>> random virtual addresses, which was the reason for this test to be
>> added by me.
>>
>>
>> Juergen
>>
>>>
>>> Thanks,
>>> Peng.
>>>
>>>>
>>>> I think there is something wrong with your setup in combination with the 
>>>> ARM
>>>> xen_create_contiguous_region() implementation.
>>>>
>>>> Stefano?
>>>>
>>>>
>>>> Juergen
>>>>
>>>>>
>>>>> Signed-off-by: Peng Fan 
>>>>> ---
>>>>>drivers/xen/swiotlb-xen.c | 4 ++--
>>>>>1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
>>>>> index b6d27762c6f8..ab96e468584f 100644
>>>>> --- a/drivers/xen/swiotlb-xen.c
>>>>> +++ b/drivers/xen/swiotlb-xen.c
>>>>> @@ -346,8 +346,8 @@ xen_swiotlb_free_coherent(struct device *hwdev,
>>>> size_t size, void *vaddr,
>>>>>   /* Convert the size to actually allocated. */
>>>>>   size = 1UL << (order + XEN_PAGE_SHIFT);
>>>>>
>>>>> - if (!WARN_ON((dev_addr + size - 1 > dma_mask) ||
>>>>> -  range_straddles_page_boundary(phys, size)) &&
>>>>> + if (((dev_addr + size - 1 > dma_mask) ||
>>>>> + range_straddles_page_boundary(phys, size)) &&
>>>>>   TestClearPageXenRemapped(virt_to_page(vaddr)))
>>>>>   xen_destroy_contiguous_region(phys, order);
>>>>>
>>>>>
>>>
>>



Re: [PATCH] tracing: make exported ftrace_set_clr_event non-static

2019-07-07 Thread Joe Jin
Patch looks good to me.

Reviewed-by: Joe Jin 

Thanks,
Joe
On 7/4/19 10:21 AM, Denis Efremov wrote:
> The function ftrace_set_clr_event is declared static and marked
> EXPORT_SYMBOL_GPL(), which is at best an odd combination. Because the
> function was decided to be a part of API, this commit removes the static
> attribute and adds the declaration to the header.
> 
> Fixes: f45d1225adb04 ("tracing: Kernel access to Ftrace instances")
> Signed-off-by: Denis Efremov 
> ---
>  include/linux/trace_events.h | 1 +
>  kernel/trace/trace_events.c  | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index 8a62731673f7..84bc84f00e8f 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -539,6 +539,7 @@ extern int trace_event_get_offsets(struct 
> trace_event_call *call);
>  
>  #define is_signed_type(type) (((type)(-1)) < (type)1)
>  
> +int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set);
>  int trace_set_clr_event(const char *system, const char *event, int set);
>  
>  /*
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 0ce3db67f556..b6b46184f6bf 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -795,7 +795,7 @@ static int __ftrace_set_clr_event(struct trace_array *tr, 
> const char *match,
>   return ret;
>  }
>  
> -static int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set)
> +int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set)
>  {
>   char *event = NULL, *sub = NULL, *match;
>   int ret;
> 



Re: [PATCH v2 1/2] swiotlb: add debugfs to track swiotlb buffer usage

2018-12-10 Thread Joe Jin
On 12/10/18 12:00 PM, Tim Chen wrote:
>> @@ -528,6 +538,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
>>  dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n", 
>> size);
>>  return SWIOTLB_MAP_ERROR;
>>  found:
>> +#ifdef CONFIG_DEBUG_FS
>> +io_tlb_used += nslots;
>> +#endif
> One nit I have about this patch is there are too many CONFIG_DEBUG_FS.
> 
> For example here, instead of io_tlb_used, we can have a macro defined,
> perhaps something like inc_iotlb_used(nslots).  It can be placed in the
> same section that swiotlb_create_debugfs is defined so there's a single
> place where all the CONFIG_DEBUG_FS stuff is located.
> 
> Then define inc_iotlb_used to be null when we don't have
> CONFIG_DEBUG_FS.
> 

Dongli had removed above ifdef/endif on his next patch, "[PATCH v2 2/2]
swiotlb: checking whether swiotlb buffer is full with io_tlb_used"

Thanks,
Joe


Re: [PATCH v2 1/2] swiotlb: add debugfs to track swiotlb buffer usage

2018-12-10 Thread Joe Jin
On 12/9/18 4:37 PM, Dongli Zhang wrote:
> The device driver will not be able to do dma operations once swiotlb buffer
> is full, either because the driver is using so many IO TLB blocks inflight,
> or because there is memory leak issue in device driver. To export the
> swiotlb buffer usage via debugfs would help the user estimate the size of
> swiotlb buffer to pre-allocate or analyze device driver memory leak issue.
> 
> Signed-off-by: Dongli Zhang 

Reviewed-by: Joe Jin 

> ---
> Changed since v1:
>   * init debugfs with late_initcall (suggested by Robin Murphy)
>   * create debugfs entries with debugfs_create_ulong(suggested by Robin 
> Murphy)
> 
>  kernel/dma/swiotlb.c | 50 ++
>  1 file changed, 50 insertions(+)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 045930e..3979c2c 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -35,6 +35,9 @@
>  #include 
>  #include 
>  #include 
> +#ifdef CONFIG_DEBUG_FS
> +#include 
> +#endif
>  
>  #include 
>  #include 
> @@ -73,6 +76,13 @@ static phys_addr_t io_tlb_start, io_tlb_end;
>   */
>  static unsigned long io_tlb_nslabs;
>  
> +#ifdef CONFIG_DEBUG_FS
> +/*
> + * The number of used IO TLB block
> + */
> +static unsigned long io_tlb_used;
> +#endif
> +
>  /*
>   * This is a free list describing the number of free entries available from
>   * each index
> @@ -528,6 +538,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
>   dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n", 
> size);
>   return SWIOTLB_MAP_ERROR;
>  found:
> +#ifdef CONFIG_DEBUG_FS
> + io_tlb_used += nslots;
> +#endif
>   spin_unlock_irqrestore(_tlb_lock, flags);
>  
>   /*
> @@ -588,6 +601,10 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
> phys_addr_t tlb_addr,
>*/
>   for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != 
> IO_TLB_SEGSIZE -1) && io_tlb_list[i]; i--)
>   io_tlb_list[i] = ++count;
> +
> +#ifdef CONFIG_DEBUG_FS
> + io_tlb_used -= nslots;
> +#endif
>   }
>   spin_unlock_irqrestore(_tlb_lock, flags);
>  }
> @@ -883,3 +900,36 @@ const struct dma_map_ops swiotlb_dma_ops = {
>   .dma_supported  = dma_direct_supported,
>  };
>  EXPORT_SYMBOL(swiotlb_dma_ops);
> +
> +#ifdef CONFIG_DEBUG_FS
> +
> +static int __init swiotlb_create_debugfs(void)
> +{
> + static struct dentry *d_swiotlb_usage;
> + struct dentry *ent;
> +
> + d_swiotlb_usage = debugfs_create_dir("swiotlb", NULL);
> +
> + if (!d_swiotlb_usage)
> + return -ENOMEM;
> +
> + ent = debugfs_create_ulong("io_tlb_nslabs", 0400,
> +d_swiotlb_usage, _tlb_nslabs);
> + if (!ent)
> + goto fail;
> +
> + ent = debugfs_create_ulong("io_tlb_used", 0400,
> +     d_swiotlb_usage, _tlb_used);
> + if (!ent)
> + goto fail;
> +
> + return 0;
> +
> +fail:
> + debugfs_remove_recursive(d_swiotlb_usage);
> + return -ENOMEM;
> +}
> +
> +late_initcall(swiotlb_create_debugfs);
> +
> +#endif
> 


-- 
Oracle <http://www.oracle.com>
Joe Jin | Software Development Director 
ORACLE | Linux and Virtualization
500 Oracle Parkway Redwood City, CA US 94065


Re: [PATCH v2 2/2] swiotlb: checking whether swiotlb buffer is full with io_tlb_used

2018-12-10 Thread Joe Jin
On 12/9/18 4:37 PM, Dongli Zhang wrote:
> This patch uses io_tlb_used to help check whether swiotlb buffer is full.
> io_tlb_used is no longer used for only debugfs. It is also used to help
> optimize swiotlb_tbl_map_single().
> 
> Suggested-by: Joe Jin 
> Signed-off-by: Dongli Zhang 

Reviewed-by: Joe Jin 

> ---
>  kernel/dma/swiotlb.c | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 3979c2c..9300341 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -76,12 +76,10 @@ static phys_addr_t io_tlb_start, io_tlb_end;
>   */
>  static unsigned long io_tlb_nslabs;
>  
> -#ifdef CONFIG_DEBUG_FS
>  /*
>   * The number of used IO TLB block
>   */
>  static unsigned long io_tlb_used;
> -#endif
>  
>  /*
>   * This is a free list describing the number of free entries available from
> @@ -489,6 +487,10 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
>* request and allocate a buffer from that IO TLB pool.
>*/
>   spin_lock_irqsave(_tlb_lock, flags);
> +
> + if (unlikely(nslots > io_tlb_nslabs - io_tlb_used))
> + goto not_found;
> +
>   index = ALIGN(io_tlb_index, stride);
>   if (index >= io_tlb_nslabs)
>   index = 0;
> @@ -538,9 +540,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
>   dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n", 
> size);
>   return SWIOTLB_MAP_ERROR;
>  found:
> -#ifdef CONFIG_DEBUG_FS
>   io_tlb_used += nslots;
> -#endif
>   spin_unlock_irqrestore(_tlb_lock, flags);
>  
>   /*
> @@ -602,9 +602,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
> phys_addr_t tlb_addr,
>   for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != 
> IO_TLB_SEGSIZE -1) && io_tlb_list[i]; i--)
>   io_tlb_list[i] = ++count;
>  
> -#ifdef CONFIG_DEBUG_FS
>   io_tlb_used -= nslots;
> -#endif
>   }
>   spin_unlock_irqrestore(_tlb_lock, flags);
>  }
> 


-- 
Oracle <http://www.oracle.com>
Joe Jin | Software Development Director 
ORACLE | Linux and Virtualization
500 Oracle Parkway Redwood City, CA US 94065


Re: [PATCH RFC 1/1] swiotlb: add debugfs to track swiotlb buffer usage

2018-12-06 Thread Joe Jin
On 12/6/18 9:49 PM, Dongli Zhang wrote:
> 
> 
> On 12/07/2018 12:12 AM, Joe Jin wrote:
>> Hi Dongli,
>>
>> Maybe move d_swiotlb_usage declare into swiotlb_create_debugfs():
> 
> I assume the call of swiotlb_tbl_map_single() might be frequent in some
> situations, e.g., when 'swiotlb=force'.
> 
> That's why I declare the d_swiotlb_usage out of any functions and use "if
> (unlikely(!d_swiotlb_usage))".

This is reasonable.

Thanks,
Joe

> 
> I think "if (unlikely(!d_swiotlb_usage))" incur less performance overhead than
> calling swiotlb_create_debugfs() every time to confirm if debugfs is created. 
> I
> would declare d_swiotlb_usage statically inside swiotlb_create_debugfs() if 
> the
> performance overhead is acceptable (it is trivial indeed).
> 
> 
> That is the reason I tag the patch with RFC because I am not sure if the
> on-demand creation of debugfs is fine with maintainers/reviewers. If swiotlb
> pages are never allocated, we would not be able to see the debugfs entry.
> 
> I would prefer to limit the modification within swiotlb and to not taint any
> other files.
> 
> The drawback is there is no place to create or delete the debugfs entry 
> because
> swiotlb buffer could be initialized and uninitialized at very early stage.
> 
>>
>> void swiotlb_create_debugfs(void)
>> {
>> #ifdef CONFIG_DEBUG_FS
>>  static struct dentry *d_swiotlb_usage = NULL;
>>
>>  if (d_swiotlb_usage)
>>  return;
>>
>>  d_swiotlb_usage = debugfs_create_dir("swiotlb", NULL);
>>
>>  if (!d_swiotlb_usage)
>>  return;
>>
>>  debugfs_create_file("usage", 0600, d_swiotlb_usage,
>>  NULL, _usage_fops);
>> #endif
>> }
>>
>> And for io_tlb_used, possible add a check at the begin of 
>> swiotlb_tbl_map_single(),
>> if there were not any free slots or not enough slots, return fail directly?
> 
> This would optimize the slots allocation path. I will follow this in next
> version after I got more suggestions and confirmations from maintainers.
> 
> 
> Thank you very much!
> 
> Dongli Zhang
> 
>>
>> Thanks,
>> Joe
>> On 12/5/18 7:59 PM, Dongli Zhang wrote:
>>> The device driver will not be able to do dma operations once swiotlb buffer
>>> is full, either because the driver is using so many IO TLB blocks inflight,
>>> or because there is memory leak issue in device driver. To export the
>>> swiotlb buffer usage via debugfs would help the user estimate the size of
>>> swiotlb buffer to pre-allocate or analyze device driver memory leak issue.
>>>
>>> As the swiotlb can be initialized at very early stage when debugfs cannot
>>> register successfully, this patch creates the debugfs entry on demand.
>>>
>>> Signed-off-by: Dongli Zhang 
>>> ---
>>>  kernel/dma/swiotlb.c | 57 
>>> 
>>>  1 file changed, 57 insertions(+)
>>>
>>> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
>>> index 045930e..d3c8aa4 100644
>>> --- a/kernel/dma/swiotlb.c
>>> +++ b/kernel/dma/swiotlb.c
>>> @@ -35,6 +35,9 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#ifdef CONFIG_DEBUG_FS
>>> +#include 
>>> +#endif
>>>  
>>>  #include 
>>>  #include 
>>> @@ -73,6 +76,13 @@ static phys_addr_t io_tlb_start, io_tlb_end;
>>>   */
>>>  static unsigned long io_tlb_nslabs;
>>>  
>>> +#ifdef CONFIG_DEBUG_FS
>>> +/*
>>> + * The number of used IO TLB block
>>> + */
>>> +static unsigned long io_tlb_used;
>>> +#endif
>>> +
>>>  /*
>>>   * This is a free list describing the number of free entries available from
>>>   * each index
>>> @@ -100,6 +110,41 @@ static DEFINE_SPINLOCK(io_tlb_lock);
>>>  
>>>  static int late_alloc;
>>>  
>>> +#ifdef CONFIG_DEBUG_FS
>>> +
>>> +static struct dentry *d_swiotlb_usage;
>>> +
>>> +static int swiotlb_usage_show(struct seq_file *m, void *v)
>>> +{
>>> +   seq_printf(m, "%lu\n%lu\n", io_tlb_used, io_tlb_nslabs);
>>> +   return 0;
>>> +}
>>> +
>>> +static int swiotlb_usage_open(struct inode *inode, struct file *filp)
>>> +{
>>> +   return single_open(filp, swiotlb_usage_show, NULL);
>>> +}
>>> +
>>> +static const struc

Re: [PATCH RFC 1/1] swiotlb: add debugfs to track swiotlb buffer usage

2018-12-06 Thread Joe Jin
Hi Dongli,

Maybe move d_swiotlb_usage declare into swiotlb_create_debugfs():

void swiotlb_create_debugfs(void)
{
#ifdef CONFIG_DEBUG_FS
static struct dentry *d_swiotlb_usage = NULL;

if (d_swiotlb_usage)
return;

d_swiotlb_usage = debugfs_create_dir("swiotlb", NULL);

if (!d_swiotlb_usage)
return;

debugfs_create_file("usage", 0600, d_swiotlb_usage,
NULL, _usage_fops);
#endif
}

And for io_tlb_used, possible add a check at the begin of 
swiotlb_tbl_map_single(),
if there were not any free slots or not enough slots, return fail directly?

Thanks,
Joe
On 12/5/18 7:59 PM, Dongli Zhang wrote:
> The device driver will not be able to do dma operations once swiotlb buffer
> is full, either because the driver is using so many IO TLB blocks inflight,
> or because there is memory leak issue in device driver. To export the
> swiotlb buffer usage via debugfs would help the user estimate the size of
> swiotlb buffer to pre-allocate or analyze device driver memory leak issue.
> 
> As the swiotlb can be initialized at very early stage when debugfs cannot
> register successfully, this patch creates the debugfs entry on demand.
> 
> Signed-off-by: Dongli Zhang 
> ---
>  kernel/dma/swiotlb.c | 57 
> 
>  1 file changed, 57 insertions(+)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 045930e..d3c8aa4 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -35,6 +35,9 @@
>  #include 
>  #include 
>  #include 
> +#ifdef CONFIG_DEBUG_FS
> +#include 
> +#endif
>  
>  #include 
>  #include 
> @@ -73,6 +76,13 @@ static phys_addr_t io_tlb_start, io_tlb_end;
>   */
>  static unsigned long io_tlb_nslabs;
>  
> +#ifdef CONFIG_DEBUG_FS
> +/*
> + * The number of used IO TLB block
> + */
> +static unsigned long io_tlb_used;
> +#endif
> +
>  /*
>   * This is a free list describing the number of free entries available from
>   * each index
> @@ -100,6 +110,41 @@ static DEFINE_SPINLOCK(io_tlb_lock);
>  
>  static int late_alloc;
>  
> +#ifdef CONFIG_DEBUG_FS
> +
> +static struct dentry *d_swiotlb_usage;
> +
> +static int swiotlb_usage_show(struct seq_file *m, void *v)
> +{
> + seq_printf(m, "%lu\n%lu\n", io_tlb_used, io_tlb_nslabs);
> + return 0;
> +}
> +
> +static int swiotlb_usage_open(struct inode *inode, struct file *filp)
> +{
> + return single_open(filp, swiotlb_usage_show, NULL);
> +}
> +
> +static const struct file_operations swiotlb_usage_fops = {
> + .open   = swiotlb_usage_open,
> + .read   = seq_read,
> + .llseek = seq_lseek,
> + .release= single_release,
> +};
> +
> +void swiotlb_create_debugfs(void)
> +{
> + d_swiotlb_usage = debugfs_create_dir("swiotlb", NULL);
> +
> + if (!d_swiotlb_usage)
> + return;
> +
> + debugfs_create_file("usage", 0600, d_swiotlb_usage,
> + NULL, _usage_fops);
> +}
> +
> +#endif
> +
>  static int __init
>  setup_io_tlb_npages(char *str)
>  {
> @@ -449,6 +494,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
>   pr_warn_once("%s is active and system is using DMA bounce 
> buffers\n",
>sme_active() ? "SME" : "SEV");
>  
> +#ifdef CONFIG_DEBUG_FS
> + if (unlikely(!d_swiotlb_usage))
> + swiotlb_create_debugfs();
> +#endif
> +
>   mask = dma_get_seg_boundary(hwdev);
>  
>   tbl_dma_addr &= mask;
> @@ -528,6 +578,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
>   dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n", 
> size);
>   return SWIOTLB_MAP_ERROR;
>  found:
> +#ifdef CONFIG_DEBUG_FS
> + io_tlb_used += nslots;
> +#endif
>   spin_unlock_irqrestore(_tlb_lock, flags);
>  
>   /*
> @@ -588,6 +641,10 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
> phys_addr_t tlb_addr,
>*/
>   for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != 
> IO_TLB_SEGSIZE -1) && io_tlb_list[i]; i--)
>   io_tlb_list[i] = ++count;
> +
> +#ifdef CONFIG_DEBUG_FS
> + io_tlb_used -= nslots;
> +#endif
>   }
>   spin_unlock_irqrestore(_tlb_lock, flags);
>  }
> 




Re: [PATCH 4.4 010/268] xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent

2018-06-07 Thread Joe Jin
On 6/7/18 1:28 PM, Ben Hutchings wrote:
> On Mon, 2018-05-28 at 11:59 +0200, Greg Kroah-Hartman wrote:
>> 4.4-stable review patch.  If anyone has any objections, please let me know.
>>
>> ----------
>>
>> From: Joe Jin 
>>
>> commit 4855c92dbb7b3b85c23e88ab7ca04f99b9677b41 upstream.
>>
>> When run raidconfig from Dom0 we found that the Xen DMA heap is reduced,
>> but Dom Heap is increased by the same size. Tracing raidconfig we found
>> that the related ioctl() in megaraid_sas will call dma_alloc_coherent()
>> to apply memory. If the memory allocated by Dom0 is not in the DMA area,
>> it will exchange memory with Xen to meet the requiment. Later drivers
>> call dma_free_coherent() to free the memory, on xen_swiotlb_free_coherent()
>> the check condition (dev_addr + size - 1 <= dma_mask) is always false,
> 
> I think this was meant to say (dev_addr + size - 1 > dma_mask), i.e.

Hi Ben,

Yes you are right, sorry I made the mistake, thanks for catch it.
Is there any way to fix description from git repo?

Regards,
Joe

> the condition that is replaced by this commit.  If that's always false,
> the new condition (the logical inverse) must always be true.
> 
> [...]
>> --- a/drivers/xen/swiotlb-xen.c
>> +++ b/drivers/xen/swiotlb-xen.c
>> @@ -359,7 +359,7 @@ xen_swiotlb_free_coherent(struct device
>>   * physical address */
>>  phys = xen_bus_to_phys(dev_addr);
>>  
>> -if (((dev_addr + size - 1 > dma_mask)) ||
>> +if (((dev_addr + size - 1 <= dma_mask)) ||
>>  range_straddles_page_boundary(phys, size))
>>  xen_destroy_contiguous_region(phys, order);
>>  
> 
> So now we will always call xen_destroy_contiguous_region(), whether or
> not xen_create_contiguous_region() was called during allocation.  Is
> that really the intent?  If so, the entire condition could be removed
> to make this clear.
> 
> Alternately, if the commit message is correct, the condition could be
> simplified to range_straddles_page_boundary(...).
> 
> But I'm not at all convinced that either of these is correct.  It seems
> like you need to either find a way of distinguishing between memory
> allocated with or without the use of xen_create_contiguous_region(), or
> to use it unconditionally.
> 
> Ben.
> 


-- 
Oracle <http://www.oracle.com>
Joe Jin | IT Director 
ORACLE | Production Engineering and Operations
600 Oracle Parkway Redwood City, CA US 94065


Re: [PATCH 4.4 010/268] xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent

2018-06-07 Thread Joe Jin
On 6/7/18 1:28 PM, Ben Hutchings wrote:
> On Mon, 2018-05-28 at 11:59 +0200, Greg Kroah-Hartman wrote:
>> 4.4-stable review patch.  If anyone has any objections, please let me know.
>>
>> ----------
>>
>> From: Joe Jin 
>>
>> commit 4855c92dbb7b3b85c23e88ab7ca04f99b9677b41 upstream.
>>
>> When run raidconfig from Dom0 we found that the Xen DMA heap is reduced,
>> but Dom Heap is increased by the same size. Tracing raidconfig we found
>> that the related ioctl() in megaraid_sas will call dma_alloc_coherent()
>> to apply memory. If the memory allocated by Dom0 is not in the DMA area,
>> it will exchange memory with Xen to meet the requiment. Later drivers
>> call dma_free_coherent() to free the memory, on xen_swiotlb_free_coherent()
>> the check condition (dev_addr + size - 1 <= dma_mask) is always false,
> 
> I think this was meant to say (dev_addr + size - 1 > dma_mask), i.e.

Hi Ben,

Yes you are right, sorry I made the mistake, thanks for catch it.
Is there any way to fix description from git repo?

Regards,
Joe

> the condition that is replaced by this commit.  If that's always false,
> the new condition (the logical inverse) must always be true.
> 
> [...]
>> --- a/drivers/xen/swiotlb-xen.c
>> +++ b/drivers/xen/swiotlb-xen.c
>> @@ -359,7 +359,7 @@ xen_swiotlb_free_coherent(struct device
>>   * physical address */
>>  phys = xen_bus_to_phys(dev_addr);
>>  
>> -if (((dev_addr + size - 1 > dma_mask)) ||
>> +if (((dev_addr + size - 1 <= dma_mask)) ||
>>  range_straddles_page_boundary(phys, size))
>>  xen_destroy_contiguous_region(phys, order);
>>  
> 
> So now we will always call xen_destroy_contiguous_region(), whether or
> not xen_create_contiguous_region() was called during allocation.  Is
> that really the intent?  If so, the entire condition could be removed
> to make this clear.
> 
> Alternately, if the commit message is correct, the condition could be
> simplified to range_straddles_page_boundary(...).
> 
> But I'm not at all convinced that either of these is correct.  It seems
> like you need to either find a way of distinguishing between memory
> allocated with or without the use of xen_create_contiguous_region(), or
> to use it unconditionally.
> 
> Ben.
> 


-- 
Oracle <http://www.oracle.com>
Joe Jin | IT Director 
ORACLE | Production Engineering and Operations
600 Oracle Parkway Redwood City, CA US 94065


[PATCH] xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent

2018-05-17 Thread Joe Jin
When run raidconfig from Dom0 we found that the Xen DMA heap is reduced,
but Dom Heap is increased by the same size. Tracing raidconfig we found
that the related ioctl() in megaraid_sas will call dma_alloc_coherent()
to apply memory. If the memory allocated by Dom0 is not in the DMA area,
it will exchange memory with Xen to meet the requiment. Later drivers
call dma_free_coherent() to free the memory, on xen_swiotlb_free_coherent()
the check condition (dev_addr + size - 1 <= dma_mask) is always false,
it prevents calling xen_destroy_contiguous_region() to return the memory
to the Xen DMA heap.

This issue introduced by commit 6810df88dcfc2 "xen-swiotlb: When doing
coherent alloc/dealloc check before swizzling the MFNs.".

Signed-off-by: Joe Jin <joe@oracle.com>
Tested-by: John Sobecki <john.sobe...@oracle.com> 
Reviewed-by: Rzeszutek Wilk <konrad.w...@oracle.com>
Cc: sta...@vger.kernel.org
---
 drivers/xen/swiotlb-xen.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index e1c60899fdbc..a6f9ba85dc4b 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -351,7 +351,7 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t 
size, void *vaddr,
 * physical address */
phys = xen_bus_to_phys(dev_addr);
 
-   if (((dev_addr + size - 1 > dma_mask)) ||
+   if (((dev_addr + size - 1 <= dma_mask)) ||
range_straddles_page_boundary(phys, size))
xen_destroy_contiguous_region(phys, order);
 
-- 
2.14.3 (Apple Git-98)



[PATCH] xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent

2018-05-17 Thread Joe Jin
When run raidconfig from Dom0 we found that the Xen DMA heap is reduced,
but Dom Heap is increased by the same size. Tracing raidconfig we found
that the related ioctl() in megaraid_sas will call dma_alloc_coherent()
to apply memory. If the memory allocated by Dom0 is not in the DMA area,
it will exchange memory with Xen to meet the requiment. Later drivers
call dma_free_coherent() to free the memory, on xen_swiotlb_free_coherent()
the check condition (dev_addr + size - 1 <= dma_mask) is always false,
it prevents calling xen_destroy_contiguous_region() to return the memory
to the Xen DMA heap.

This issue introduced by commit 6810df88dcfc2 "xen-swiotlb: When doing
coherent alloc/dealloc check before swizzling the MFNs.".

Signed-off-by: Joe Jin 
Tested-by: John Sobecki  
Reviewed-by: Rzeszutek Wilk 
Cc: sta...@vger.kernel.org
---
 drivers/xen/swiotlb-xen.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index e1c60899fdbc..a6f9ba85dc4b 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -351,7 +351,7 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t 
size, void *vaddr,
 * physical address */
phys = xen_bus_to_phys(dev_addr);
 
-   if (((dev_addr + size - 1 > dma_mask)) ||
+   if (((dev_addr + size - 1 <= dma_mask)) ||
range_straddles_page_boundary(phys, size))
xen_destroy_contiguous_region(phys, order);
 
-- 
2.14.3 (Apple Git-98)



Re: [PATCH UPSTREAM] xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent

2018-05-17 Thread Joe Jin
On 5/17/18 12:10 PM, Greg KH wrote:
> On Thu, May 17, 2018 at 11:45:57AM -0700, Joe Jin wrote:
>> When run raidconfig from Dom0 we found that the Xen DMA heap is reduced,
>> but Dom Heap is increased by the same size. Tracing raidconfig we found
>> that the related ioctl() in megaraid_sas will call dma_alloc_coherent()
>> to apply memory. If the memory allocated by Dom0 is not in the DMA area,
>> it will exchange memory with Xen to meet the requiment. Later drivers
>> call dma_free_coherent() to free the memory, on xen_swiotlb_free_coherent()
>> the check condition (dev_addr + size - 1 <= dma_mask) is always false,
>> it prevents calling xen_destroy_contiguous_region() to return the memory
>> to the Xen DMA heap.
>>
>> This issue introduced by commit 6810df88dcfc2 "xen-swiotlb: When doing
>> coherent alloc/dealloc check before swizzling the MFNs.".
>>
>> Signed-off-by: Joe Jin <joe@oracle.com>
>> Tested-by: John Sobecki <john.sobe...@oracle.com> 
>> Reviewed-by: Rzeszutek Wilk <konrad.w...@oracle.com>
>> Cc: sta...@vger.kernel.org
>> ---
>>  drivers/xen/swiotlb-xen.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> What does "PATCH UPSTREAM" mean?

Oops I forgot to remove UPSTREAM, the tag for internal review.

Sorry for this, will resend it without the tag.

Thanks,
Joe

> 
> confused,
> 
> greg k-h
>


Re: [PATCH UPSTREAM] xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent

2018-05-17 Thread Joe Jin
On 5/17/18 12:10 PM, Greg KH wrote:
> On Thu, May 17, 2018 at 11:45:57AM -0700, Joe Jin wrote:
>> When run raidconfig from Dom0 we found that the Xen DMA heap is reduced,
>> but Dom Heap is increased by the same size. Tracing raidconfig we found
>> that the related ioctl() in megaraid_sas will call dma_alloc_coherent()
>> to apply memory. If the memory allocated by Dom0 is not in the DMA area,
>> it will exchange memory with Xen to meet the requiment. Later drivers
>> call dma_free_coherent() to free the memory, on xen_swiotlb_free_coherent()
>> the check condition (dev_addr + size - 1 <= dma_mask) is always false,
>> it prevents calling xen_destroy_contiguous_region() to return the memory
>> to the Xen DMA heap.
>>
>> This issue introduced by commit 6810df88dcfc2 "xen-swiotlb: When doing
>> coherent alloc/dealloc check before swizzling the MFNs.".
>>
>> Signed-off-by: Joe Jin 
>> Tested-by: John Sobecki  
>> Reviewed-by: Rzeszutek Wilk 
>> Cc: sta...@vger.kernel.org
>> ---
>>  drivers/xen/swiotlb-xen.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> What does "PATCH UPSTREAM" mean?

Oops I forgot to remove UPSTREAM, the tag for internal review.

Sorry for this, will resend it without the tag.

Thanks,
Joe

> 
> confused,
> 
> greg k-h
>


[PATCH UPSTREAM] xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent

2018-05-17 Thread Joe Jin
When run raidconfig from Dom0 we found that the Xen DMA heap is reduced,
but Dom Heap is increased by the same size. Tracing raidconfig we found
that the related ioctl() in megaraid_sas will call dma_alloc_coherent()
to apply memory. If the memory allocated by Dom0 is not in the DMA area,
it will exchange memory with Xen to meet the requiment. Later drivers
call dma_free_coherent() to free the memory, on xen_swiotlb_free_coherent()
the check condition (dev_addr + size - 1 <= dma_mask) is always false,
it prevents calling xen_destroy_contiguous_region() to return the memory
to the Xen DMA heap.

This issue introduced by commit 6810df88dcfc2 "xen-swiotlb: When doing
coherent alloc/dealloc check before swizzling the MFNs.".

Signed-off-by: Joe Jin <joe@oracle.com>
Tested-by: John Sobecki <john.sobe...@oracle.com> 
Reviewed-by: Rzeszutek Wilk <konrad.w...@oracle.com>
Cc: sta...@vger.kernel.org
---
 drivers/xen/swiotlb-xen.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index e1c60899fdbc..a6f9ba85dc4b 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -351,7 +351,7 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t 
size, void *vaddr,
 * physical address */
phys = xen_bus_to_phys(dev_addr);
 
-   if (((dev_addr + size - 1 > dma_mask)) ||
+   if (((dev_addr + size - 1 <= dma_mask)) ||
range_straddles_page_boundary(phys, size))
xen_destroy_contiguous_region(phys, order);
 
-- 
2.14.3 (Apple Git-98)



[PATCH UPSTREAM] xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent

2018-05-17 Thread Joe Jin
When run raidconfig from Dom0 we found that the Xen DMA heap is reduced,
but Dom Heap is increased by the same size. Tracing raidconfig we found
that the related ioctl() in megaraid_sas will call dma_alloc_coherent()
to apply memory. If the memory allocated by Dom0 is not in the DMA area,
it will exchange memory with Xen to meet the requiment. Later drivers
call dma_free_coherent() to free the memory, on xen_swiotlb_free_coherent()
the check condition (dev_addr + size - 1 <= dma_mask) is always false,
it prevents calling xen_destroy_contiguous_region() to return the memory
to the Xen DMA heap.

This issue introduced by commit 6810df88dcfc2 "xen-swiotlb: When doing
coherent alloc/dealloc check before swizzling the MFNs.".

Signed-off-by: Joe Jin 
Tested-by: John Sobecki  
Reviewed-by: Rzeszutek Wilk 
Cc: sta...@vger.kernel.org
---
 drivers/xen/swiotlb-xen.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index e1c60899fdbc..a6f9ba85dc4b 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -351,7 +351,7 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t 
size, void *vaddr,
 * physical address */
phys = xen_bus_to_phys(dev_addr);
 
-   if (((dev_addr + size - 1 > dma_mask)) ||
+   if (((dev_addr + size - 1 <= dma_mask)) ||
range_straddles_page_boundary(phys, size))
xen_destroy_contiguous_region(phys, order);
 
-- 
2.14.3 (Apple Git-98)



Re: [PATCH V2] [scsi] enclosure: remove duplicate device before add new

2013-09-24 Thread Joe Jin
Hi James, 

Can you please help to review the patch and comment it?

Thanks,
Joe

On 09/20/13 08:16, Joe Jin wrote:
> When do disk pull/insert test we encountered below:
> 
> WARNING: at fs/sysfs/dir.c:455 sysfs_add_one+0xbc/0xe0()
> Hardware name: SUN FIRE X4370 M2 SERVER
> sysfs: cannot create duplicate filename 
> '/devices/pci:00/:00:03.0/:0d:00.0/host6/port-6:1/expander-6:1/port-6:1:14/end_device-6:1:14/target6:0:27/6:0:27:0/enclosure_device:HDD10'
> Modules linked in: mptctl mptbase autofs4 hidp bluetooth rfkill lockd sunrpc 
> bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad 
> ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio 
> libiscsi_tcp libiscsi scsi_transport_iscsi dm_round_robin dm_multipath video 
> sbs sbshc acpi_pad acpi_memhotplug acpi_ipmi parport_pc lp parport ipmi_si 
> ipmi_devintf ipmi_msghandler sg ses enclosure ixgbe e1000e hwmon igb 
> snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device 
> snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc 
> iTCO_wdt pcspkr i2c_i801 ioatdma ghes iTCO_vendor_support hed dca i2c_core 
> i7core_edac edac_core dm_snapshot dm_zero dm_mirror dm_region_hash dm_log 
> dm_mod usb_storage shpchp mpt2sas scsi_transport_sas raid_class ahci libahci 
> sd_mod crc_t10dif raid1 ext3 jbd mbcache
> Pid: 23302, comm: kworker/u:2 Tainted: P2.6.39-400.124.1.el5uek #1
> Call Trace:
>  [] ? sysfs_add_one+0xbc/0xe0
>  [] warn_slowpath_common+0x90/0xc0
>  [] warn_slowpath_fmt+0x6e/0x70
>  [] ? strlcat+0x54/0x70
>  [] sysfs_add_one+0xbc/0xe0
>  [] sysfs_do_create_link+0x148/0x1d0
>  [] sysfs_create_link+0x13/0x20
>  [] enclosure_add_links+0xe7/0x110 [enclosure]
>  [] ? kobject_release+0xd/0x10
>  [] ? kref_put+0x37/0x70
>  [] enclosure_add_device+0x93/0xa0 [enclosure]
>  [] ses_enclosure_find_by_addr+0x76/0xc0 [ses]
>  [] ? ses_get_fault+0x40/0x40 [ses]
>  [] enclosure_for_each_device+0x63/0x90 [enclosure]
>  [] ses_match_to_enclosure+0x11a/0x1d0 [ses]
>  [] ses_intf_add+0x2c8/0x5c0 [ses]
>  [] ? kobject_get+0x1a/0x30
>  [] ? add_tail+0x36/0x50
>  [] device_add+0x2d4/0x380
>  [] scsi_sysfs_add_sdev+0xe6/0x2a0
>  [] scsi_add_lun+0x41c/0x560
>  [] scsi_probe_and_add_lun+0x1e0/0x3e0
>  [] ? default_spin_lock_flags+0x9/0x10
>  [] __scsi_scan_target+0xe7/0x120
>  [] scsi_scan_target+0xcd/0xf0
>  [] sas_rphy_add+0x11b/0x170 [scsi_transport_sas]
>  [] mpt2sas_transport_port_add+0x2cf/0x430 [mpt2sas]
>  [] _scsih_sas_device_add+0x87/0x110 [mpt2sas]
>  [] _scsih_add_device+0x248/0x340 [mpt2sas]
>  [] ? mpt2sas_transport_update_links+0xf1/0x190 [mpt2sas]
>  [] _scsih_sas_topology_change_event+0x3c6/0x490 [mpt2sas]
>  [] ? add_timer+0x18/0x20
>  [] ? queue_delayed_work_on+0xc5/0x170
>  [] _mpt2sas_fw_work+0x205/0x240 [mpt2sas]
>  [] _firmware_event_work_delayed+0x19/0x20 [mpt2sas]
>  [] process_one_work+0xf9/0x370
>  [] ? _mpt2sas_fw_work+0x240/0x240 [mpt2sas]
>  [] worker_thread+0xca/0x240
>  [] ? manage_workers+0x90/0x90
>  [] kthread+0x97/0xa0
>  [] kernel_thread_helper+0x4/0x10
>  [] ? kthread_bind+0x80/0x80
>  [] ? gs_change+0x13/0x13
> ---[ end trace 89a1351702ab360f ]---
> 
> This caused by duplicate device in enclosure list, we need to remove the
> possible duplicate entry to avoid the conflict when we add new one.
> 
> Cc: James Bottomley 
> Signed-off-by: Joe Jin 
> ---
>  drivers/misc/enclosure.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c
> index 0e8df41..173974d 100644
> --- a/drivers/misc/enclosure.c
> +++ b/drivers/misc/enclosure.c
> @@ -325,6 +325,8 @@ int enclosure_add_device(struct enclosure_device *edev, 
> int component,
>   if (cdev->dev)
>   enclosure_remove_links(cdev);
>  
> + enclosure_remove_device(edev, dev);
> +
>   put_device(cdev->dev);
>   cdev->dev = get_device(dev);
>   return enclosure_add_links(cdev);
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] [scsi] enclosure: remove duplicate device before add new

2013-09-24 Thread Joe Jin
Hi James, 

Can you please help to review the patch and comment it?

Thanks,
Joe

On 09/20/13 08:16, Joe Jin wrote:
 When do disk pull/insert test we encountered below:
 
 WARNING: at fs/sysfs/dir.c:455 sysfs_add_one+0xbc/0xe0()
 Hardware name: SUN FIRE X4370 M2 SERVER
 sysfs: cannot create duplicate filename 
 '/devices/pci:00/:00:03.0/:0d:00.0/host6/port-6:1/expander-6:1/port-6:1:14/end_device-6:1:14/target6:0:27/6:0:27:0/enclosure_device:HDD10'
 Modules linked in: mptctl mptbase autofs4 hidp bluetooth rfkill lockd sunrpc 
 bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad 
 ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio 
 libiscsi_tcp libiscsi scsi_transport_iscsi dm_round_robin dm_multipath video 
 sbs sbshc acpi_pad acpi_memhotplug acpi_ipmi parport_pc lp parport ipmi_si 
 ipmi_devintf ipmi_msghandler sg ses enclosure ixgbe e1000e hwmon igb 
 snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device 
 snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc 
 iTCO_wdt pcspkr i2c_i801 ioatdma ghes iTCO_vendor_support hed dca i2c_core 
 i7core_edac edac_core dm_snapshot dm_zero dm_mirror dm_region_hash dm_log 
 dm_mod usb_storage shpchp mpt2sas scsi_transport_sas raid_class ahci libahci 
 sd_mod crc_t10dif raid1 ext3 jbd mbcache
 Pid: 23302, comm: kworker/u:2 Tainted: P2.6.39-400.124.1.el5uek #1
 Call Trace:
  [811daf8c] ? sysfs_add_one+0xbc/0xe0
  [8106f030] warn_slowpath_common+0x90/0xc0
  [8106f15e] warn_slowpath_fmt+0x6e/0x70
  [81258bd4] ? strlcat+0x54/0x70
  [811daf8c] sysfs_add_one+0xbc/0xe0
  [811dbec8] sysfs_do_create_link+0x148/0x1d0
  [811dbf83] sysfs_create_link+0x13/0x20
  [a00de307] enclosure_add_links+0xe7/0x110 [enclosure]
  [8125325d] ? kobject_release+0xd/0x10
  [812549e7] ? kref_put+0x37/0x70
  [a00de3c3] enclosure_add_device+0x93/0xa0 [enclosure]
  [a00c8666] ses_enclosure_find_by_addr+0x76/0xc0 [ses]
  [a00c85f0] ? ses_get_fault+0x40/0x40 [ses]
  [a00de433] enclosure_for_each_device+0x63/0x90 [enclosure]
  [a00c8a8a] ses_match_to_enclosure+0x11a/0x1d0 [ses]
  [a00c8e08] ses_intf_add+0x2c8/0x5c0 [ses]
  [8125327a] ? kobject_get+0x1a/0x30
  [814e8b56] ? add_tail+0x36/0x50
  [81345ae4] device_add+0x2d4/0x380
  [8136b096] scsi_sysfs_add_sdev+0xe6/0x2a0
  [813682cc] scsi_add_lun+0x41c/0x560
  [81368a80] scsi_probe_and_add_lun+0x1e0/0x3e0
  [81041009] ? default_spin_lock_flags+0x9/0x10
  [813696e7] __scsi_scan_target+0xe7/0x120
  [81369b8d] scsi_scan_target+0xcd/0xf0
  [a003faab] sas_rphy_add+0x11b/0x170 [scsi_transport_sas]
  [a009a74f] mpt2sas_transport_port_add+0x2cf/0x430 [mpt2sas]
  [a008d437] _scsih_sas_device_add+0x87/0x110 [mpt2sas]
  [a0094eb8] _scsih_add_device+0x248/0x340 [mpt2sas]
  [a0098cb1] ? mpt2sas_transport_update_links+0xf1/0x190 [mpt2sas]
  [a00977b6] _scsih_sas_topology_change_event+0x3c6/0x490 [mpt2sas]
  [81080698] ? add_timer+0x18/0x20
  [8108a405] ? queue_delayed_work_on+0xc5/0x170
  [a0097a85] _mpt2sas_fw_work+0x205/0x240 [mpt2sas]
  [a0097ad9] _firmware_event_work_delayed+0x19/0x20 [mpt2sas]
  [8108c0d9] process_one_work+0xf9/0x370
  [a0097ac0] ? _mpt2sas_fw_work+0x240/0x240 [mpt2sas]
  [8108ca1a] worker_thread+0xca/0x240
  [8108c950] ? manage_workers+0x90/0x90
  [81090ff7] kthread+0x97/0xa0
  [8150fdc4] kernel_thread_helper+0x4/0x10
  [81090f60] ? kthread_bind+0x80/0x80
  [8150fdc0] ? gs_change+0x13/0x13
 ---[ end trace 89a1351702ab360f ]---
 
 This caused by duplicate device in enclosure list, we need to remove the
 possible duplicate entry to avoid the conflict when we add new one.
 
 Cc: James Bottomley james.bottom...@hansenpartnership.com
 Signed-off-by: Joe Jin joe@oracle.com
 ---
  drivers/misc/enclosure.c | 2 ++
  1 file changed, 2 insertions(+)
 
 diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c
 index 0e8df41..173974d 100644
 --- a/drivers/misc/enclosure.c
 +++ b/drivers/misc/enclosure.c
 @@ -325,6 +325,8 @@ int enclosure_add_device(struct enclosure_device *edev, 
 int component,
   if (cdev-dev)
   enclosure_remove_links(cdev);
  
 + enclosure_remove_device(edev, dev);
 +
   put_device(cdev-dev);
   cdev-dev = get_device(dev);
   return enclosure_add_links(cdev);
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2] [scsi] enclosure: remove duplicate device before add new

2013-09-19 Thread Joe Jin
When do disk pull/insert test we encountered below:

WARNING: at fs/sysfs/dir.c:455 sysfs_add_one+0xbc/0xe0()
Hardware name: SUN FIRE X4370 M2 SERVER
sysfs: cannot create duplicate filename 
'/devices/pci:00/:00:03.0/:0d:00.0/host6/port-6:1/expander-6:1/port-6:1:14/end_device-6:1:14/target6:0:27/6:0:27:0/enclosure_device:HDD10'
Modules linked in: mptctl mptbase autofs4 hidp bluetooth rfkill lockd sunrpc 
bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad 
ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio 
libiscsi_tcp libiscsi scsi_transport_iscsi dm_round_robin dm_multipath video 
sbs sbshc acpi_pad acpi_memhotplug acpi_ipmi parport_pc lp parport ipmi_si 
ipmi_devintf ipmi_msghandler sg ses enclosure ixgbe e1000e hwmon igb 
snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss 
snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr 
i2c_i801 ioatdma ghes iTCO_vendor_support hed dca i2c_core i7core_edac 
edac_core dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod 
usb_storage shpchp mpt2sas scsi_transport_sas raid_class ahci libahci sd_mod 
crc_t10dif raid1 ext3 jbd mbcache
Pid: 23302, comm: kworker/u:2 Tainted: P2.6.39-400.124.1.el5uek #1
Call Trace:
 [] ? sysfs_add_one+0xbc/0xe0
 [] warn_slowpath_common+0x90/0xc0
 [] warn_slowpath_fmt+0x6e/0x70
 [] ? strlcat+0x54/0x70
 [] sysfs_add_one+0xbc/0xe0
 [] sysfs_do_create_link+0x148/0x1d0
 [] sysfs_create_link+0x13/0x20
 [] enclosure_add_links+0xe7/0x110 [enclosure]
 [] ? kobject_release+0xd/0x10
 [] ? kref_put+0x37/0x70
 [] enclosure_add_device+0x93/0xa0 [enclosure]
 [] ses_enclosure_find_by_addr+0x76/0xc0 [ses]
 [] ? ses_get_fault+0x40/0x40 [ses]
 [] enclosure_for_each_device+0x63/0x90 [enclosure]
 [] ses_match_to_enclosure+0x11a/0x1d0 [ses]
 [] ses_intf_add+0x2c8/0x5c0 [ses]
 [] ? kobject_get+0x1a/0x30
 [] ? add_tail+0x36/0x50
 [] device_add+0x2d4/0x380
 [] scsi_sysfs_add_sdev+0xe6/0x2a0
 [] scsi_add_lun+0x41c/0x560
 [] scsi_probe_and_add_lun+0x1e0/0x3e0
 [] ? default_spin_lock_flags+0x9/0x10
 [] __scsi_scan_target+0xe7/0x120
 [] scsi_scan_target+0xcd/0xf0
 [] sas_rphy_add+0x11b/0x170 [scsi_transport_sas]
 [] mpt2sas_transport_port_add+0x2cf/0x430 [mpt2sas]
 [] _scsih_sas_device_add+0x87/0x110 [mpt2sas]
 [] _scsih_add_device+0x248/0x340 [mpt2sas]
 [] ? mpt2sas_transport_update_links+0xf1/0x190 [mpt2sas]
 [] _scsih_sas_topology_change_event+0x3c6/0x490 [mpt2sas]
 [] ? add_timer+0x18/0x20
 [] ? queue_delayed_work_on+0xc5/0x170
 [] _mpt2sas_fw_work+0x205/0x240 [mpt2sas]
 [] _firmware_event_work_delayed+0x19/0x20 [mpt2sas]
 [] process_one_work+0xf9/0x370
 [] ? _mpt2sas_fw_work+0x240/0x240 [mpt2sas]
 [] worker_thread+0xca/0x240
 [] ? manage_workers+0x90/0x90
 [] kthread+0x97/0xa0
 [] kernel_thread_helper+0x4/0x10
 [] ? kthread_bind+0x80/0x80
 [] ? gs_change+0x13/0x13
---[ end trace 89a1351702ab360f ]---

This caused by duplicate device in enclosure list, we need to remove the
possible duplicate entry to avoid the conflict when we add new one.

Cc: James Bottomley 
Signed-off-by: Joe Jin 
---
 drivers/misc/enclosure.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c
index 0e8df41..173974d 100644
--- a/drivers/misc/enclosure.c
+++ b/drivers/misc/enclosure.c
@@ -325,6 +325,8 @@ int enclosure_add_device(struct enclosure_device *edev, int 
component,
if (cdev->dev)
enclosure_remove_links(cdev);
 
+   enclosure_remove_device(edev, dev);
+
put_device(cdev->dev);
cdev->dev = get_device(dev);
return enclosure_add_links(cdev);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2] [scsi] enclosure: remove duplicate device before add new

2013-09-19 Thread Joe Jin
When do disk pull/insert test we encountered below:

WARNING: at fs/sysfs/dir.c:455 sysfs_add_one+0xbc/0xe0()
Hardware name: SUN FIRE X4370 M2 SERVER
sysfs: cannot create duplicate filename 
'/devices/pci:00/:00:03.0/:0d:00.0/host6/port-6:1/expander-6:1/port-6:1:14/end_device-6:1:14/target6:0:27/6:0:27:0/enclosure_device:HDD10'
Modules linked in: mptctl mptbase autofs4 hidp bluetooth rfkill lockd sunrpc 
bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad 
ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio 
libiscsi_tcp libiscsi scsi_transport_iscsi dm_round_robin dm_multipath video 
sbs sbshc acpi_pad acpi_memhotplug acpi_ipmi parport_pc lp parport ipmi_si 
ipmi_devintf ipmi_msghandler sg ses enclosure ixgbe e1000e hwmon igb 
snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss 
snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr 
i2c_i801 ioatdma ghes iTCO_vendor_support hed dca i2c_core i7core_edac 
edac_core dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod 
usb_storage shpchp mpt2sas scsi_transport_sas raid_class ahci libahci sd_mod 
crc_t10dif raid1 ext3 jbd mbcache
Pid: 23302, comm: kworker/u:2 Tainted: P2.6.39-400.124.1.el5uek #1
Call Trace:
 [811daf8c] ? sysfs_add_one+0xbc/0xe0
 [8106f030] warn_slowpath_common+0x90/0xc0
 [8106f15e] warn_slowpath_fmt+0x6e/0x70
 [81258bd4] ? strlcat+0x54/0x70
 [811daf8c] sysfs_add_one+0xbc/0xe0
 [811dbec8] sysfs_do_create_link+0x148/0x1d0
 [811dbf83] sysfs_create_link+0x13/0x20
 [a00de307] enclosure_add_links+0xe7/0x110 [enclosure]
 [8125325d] ? kobject_release+0xd/0x10
 [812549e7] ? kref_put+0x37/0x70
 [a00de3c3] enclosure_add_device+0x93/0xa0 [enclosure]
 [a00c8666] ses_enclosure_find_by_addr+0x76/0xc0 [ses]
 [a00c85f0] ? ses_get_fault+0x40/0x40 [ses]
 [a00de433] enclosure_for_each_device+0x63/0x90 [enclosure]
 [a00c8a8a] ses_match_to_enclosure+0x11a/0x1d0 [ses]
 [a00c8e08] ses_intf_add+0x2c8/0x5c0 [ses]
 [8125327a] ? kobject_get+0x1a/0x30
 [814e8b56] ? add_tail+0x36/0x50
 [81345ae4] device_add+0x2d4/0x380
 [8136b096] scsi_sysfs_add_sdev+0xe6/0x2a0
 [813682cc] scsi_add_lun+0x41c/0x560
 [81368a80] scsi_probe_and_add_lun+0x1e0/0x3e0
 [81041009] ? default_spin_lock_flags+0x9/0x10
 [813696e7] __scsi_scan_target+0xe7/0x120
 [81369b8d] scsi_scan_target+0xcd/0xf0
 [a003faab] sas_rphy_add+0x11b/0x170 [scsi_transport_sas]
 [a009a74f] mpt2sas_transport_port_add+0x2cf/0x430 [mpt2sas]
 [a008d437] _scsih_sas_device_add+0x87/0x110 [mpt2sas]
 [a0094eb8] _scsih_add_device+0x248/0x340 [mpt2sas]
 [a0098cb1] ? mpt2sas_transport_update_links+0xf1/0x190 [mpt2sas]
 [a00977b6] _scsih_sas_topology_change_event+0x3c6/0x490 [mpt2sas]
 [81080698] ? add_timer+0x18/0x20
 [8108a405] ? queue_delayed_work_on+0xc5/0x170
 [a0097a85] _mpt2sas_fw_work+0x205/0x240 [mpt2sas]
 [a0097ad9] _firmware_event_work_delayed+0x19/0x20 [mpt2sas]
 [8108c0d9] process_one_work+0xf9/0x370
 [a0097ac0] ? _mpt2sas_fw_work+0x240/0x240 [mpt2sas]
 [8108ca1a] worker_thread+0xca/0x240
 [8108c950] ? manage_workers+0x90/0x90
 [81090ff7] kthread+0x97/0xa0
 [8150fdc4] kernel_thread_helper+0x4/0x10
 [81090f60] ? kthread_bind+0x80/0x80
 [8150fdc0] ? gs_change+0x13/0x13
---[ end trace 89a1351702ab360f ]---

This caused by duplicate device in enclosure list, we need to remove the
possible duplicate entry to avoid the conflict when we add new one.

Cc: James Bottomley james.bottom...@hansenpartnership.com
Signed-off-by: Joe Jin joe@oracle.com
---
 drivers/misc/enclosure.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c
index 0e8df41..173974d 100644
--- a/drivers/misc/enclosure.c
+++ b/drivers/misc/enclosure.c
@@ -325,6 +325,8 @@ int enclosure_add_device(struct enclosure_device *edev, int 
component,
if (cdev-dev)
enclosure_remove_links(cdev);
 
+   enclosure_remove_device(edev, dev);
+
put_device(cdev-dev);
cdev-dev = get_device(dev);
return enclosure_add_links(cdev);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [scsi] enclosure: remove all possible sysfs entries before add device

2013-09-11 Thread Joe Jin
add_tail+0x36/0x50
 [] device_add+0x2d4/0x380
 [] scsi_sysfs_add_sdev+0xe6/0x2a0
 [] scsi_add_lun+0x41c/0x560
 [] scsi_probe_and_add_lun+0x1e0/0x3e0
 [] ? default_spin_lock_flags+0x9/0x10
 [] __scsi_scan_target+0xe7/0x120
 [] scsi_scan_target+0xcd/0xf0
 [] sas_rphy_add+0x11b/0x170 [scsi_transport_sas]
 [] mpt2sas_transport_port_add+0x2cf/0x430 [mpt2sas]
 [] _scsih_sas_device_add+0x87/0x110 [mpt2sas]
 [] _scsih_add_device+0x248/0x340 [mpt2sas]
 [] ? mpt2sas_transport_update_links+0xf1/0x190 [mpt2sas]
 [] _scsih_sas_topology_change_event+0x3c6/0x490 [mpt2sas]
 [] ? add_timer+0x18/0x20
 [] ? queue_delayed_work_on+0xc5/0x170
 [] _mpt2sas_fw_work+0x205/0x240 [mpt2sas]
 [] _firmware_event_work_delayed+0x19/0x20 [mpt2sas]
 [] process_one_work+0xf9/0x370
 [] _firmware_event_work_delayed+0x19/0x20 [mpt2sas]
 [] process_one_work+0xf9/0x370
 [] ? _mpt2sas_fw_work+0x240/0x240 [mpt2sas]
 [] worker_thread+0xca/0x240
 [] ? manage_workers+0x90/0x90
 [] kthread+0x97/0xa0
 [] kernel_thread_helper+0x4/0x10
 [] ? kthread_bind+0x80/0x80
 [] ? gs_change+0x13/0x13
---[ end trace 89a1351702ab360f ]---
[ses_enclosure_find_by_addr] call 
enclosure_add_device(edev=8817e4094000,i=4,efd->dev=8817e8304938),cdev=8817e4094cd0

Per above message you can see the last tried for enclosure_device:HDD10, 
the index of component is not same then conflicted.

BTW, 6:0:27:0 and 7:0:27:0 are same disk.

> 
>> > Cc: James Bottomley 
>> > Signed-off-by: Joe Jin 
>> > ---
>> >  drivers/misc/enclosure.c | 7 +++
>> >  1 file changed, 7 insertions(+)
>> > 
>> > diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c
>> > index 0e8df41..efc0e86 100644
>> > --- a/drivers/misc/enclosure.c
>> > +++ b/drivers/misc/enclosure.c
>> > @@ -325,6 +325,13 @@ int enclosure_add_device(struct enclosure_device 
>> > *edev, int component,
>> >if (cdev->dev)
>> >enclosure_remove_links(cdev);
>> >  
>> > +  if (dev) {
> This test is pointless.  Adding a NULL device is illegal.

Yes this is right.

Thanks,
Joe


> 
>> > +  char name[ENCLOSURE_NAME_SIZE];
>> > +
>> > +  enclosure_link_name(cdev, name);
>> > +  sysfs_remove_link(>kobj, name);
> If we're really going to force eject the device, then this should be
> enclosure_remove_device(edev, dev);
> 
> How do you prevent the case for remove re-add in the same slot?  Surely
> in that case, with your code, the link will get removed again when the
> remove gets processed, so the slot will then look empty (even though
> it's not).


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [scsi] enclosure: remove all possible sysfs entries before add device

2013-09-11 Thread Joe Jin
] ? kobject_release+0xd/0x10
 [812549e7] ? kref_put+0x37/0x70
 [a00de3c3] enclosure_add_device+0x93/0xa0 [enclosure]
 [a00c8666] ses_enclosure_find_by_addr+0x76/0xc0 [ses]
 [a00c85f0] ? ses_get_fault+0x40/0x40 [ses]
 [a00de433] enclosure_for_each_device+0x63/0x90 [enclosure]
 [a00c8a8a] ses_match_to_enclosure+0x11a/0x1d0 [ses]
 [a00c8e08] ses_intf_add+0x2c8/0x5c0 [ses]
 [8125327a] ? kobject_get+0x1a/0x30
 [814e8b56] ? add_tail+0x36/0x50
 [81345ae4] device_add+0x2d4/0x380
 [8136b096] scsi_sysfs_add_sdev+0xe6/0x2a0
 [813682cc] scsi_add_lun+0x41c/0x560
 [81368a80] scsi_probe_and_add_lun+0x1e0/0x3e0
 [81041009] ? default_spin_lock_flags+0x9/0x10
 [813696e7] __scsi_scan_target+0xe7/0x120
 [81369b8d] scsi_scan_target+0xcd/0xf0
 [a003faab] sas_rphy_add+0x11b/0x170 [scsi_transport_sas]
 [a009a74f] mpt2sas_transport_port_add+0x2cf/0x430 [mpt2sas]
 [a008d437] _scsih_sas_device_add+0x87/0x110 [mpt2sas]
 [a0094eb8] _scsih_add_device+0x248/0x340 [mpt2sas]
 [a0098cb1] ? mpt2sas_transport_update_links+0xf1/0x190 [mpt2sas]
 [a00977b6] _scsih_sas_topology_change_event+0x3c6/0x490 [mpt2sas]
 [81080698] ? add_timer+0x18/0x20
 [8108a405] ? queue_delayed_work_on+0xc5/0x170
 [a0097a85] _mpt2sas_fw_work+0x205/0x240 [mpt2sas]
 [a0097ad9] _firmware_event_work_delayed+0x19/0x20 [mpt2sas]
 [8108c0d9] process_one_work+0xf9/0x370
 [a0097ad9] _firmware_event_work_delayed+0x19/0x20 [mpt2sas]
 [8108c0d9] process_one_work+0xf9/0x370
 [a0097ac0] ? _mpt2sas_fw_work+0x240/0x240 [mpt2sas]
 [8108ca1a] worker_thread+0xca/0x240
 [8108c950] ? manage_workers+0x90/0x90
 [81090ff7] kthread+0x97/0xa0
 [8150fdc4] kernel_thread_helper+0x4/0x10
 [81090f60] ? kthread_bind+0x80/0x80
 [8150fdc0] ? gs_change+0x13/0x13
---[ end trace 89a1351702ab360f ]---
[ses_enclosure_find_by_addr] call 
enclosure_add_device(edev=8817e4094000,i=4,efd-dev=8817e8304938),cdev=8817e4094cd0

Per above message you can see the last tried for enclosure_device:HDD10, 
the index of component is not same then conflicted.

BTW, 6:0:27:0 and 7:0:27:0 are same disk.

 
  Cc: James Bottomley james.bottom...@hansenpartnership.com
  Signed-off-by: Joe Jin joe@oracle.com
  ---
   drivers/misc/enclosure.c | 7 +++
   1 file changed, 7 insertions(+)
  
  diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c
  index 0e8df41..efc0e86 100644
  --- a/drivers/misc/enclosure.c
  +++ b/drivers/misc/enclosure.c
  @@ -325,6 +325,13 @@ int enclosure_add_device(struct enclosure_device 
  *edev, int component,
 if (cdev-dev)
 enclosure_remove_links(cdev);
   
  +  if (dev) {
 This test is pointless.  Adding a NULL device is illegal.

Yes this is right.

Thanks,
Joe


 
  +  char name[ENCLOSURE_NAME_SIZE];
  +
  +  enclosure_link_name(cdev, name);
  +  sysfs_remove_link(dev-kobj, name);
 If we're really going to force eject the device, then this should be
 enclosure_remove_device(edev, dev);
 
 How do you prevent the case for remove re-add in the same slot?  Surely
 in that case, with your code, the link will get removed again when the
 remove gets processed, so the slot will then look empty (even though
 it's not).


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [scsi] enclosure: remove all possible sysfs entries before add device

2013-09-09 Thread Joe Jin
On 09/09/13 21:41, Christoph Hellwig wrote:
>> Modules linked in: oracleacfs(P)(U) oracleadvm(P)(U) oracleoks(P)(U)
> 
> Please reproduce without this weird crap loaded.
> 
These modules is filesystem and will not impact enclosure.

Thanks,
Joe




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [scsi] enclosure: remove all possible sysfs entries before add device

2013-09-09 Thread Joe Jin
When do disk pull/insert test we encountered below:

WARNING: at fs/sysfs/dir.c:455 sysfs_add_one+0xbc/0xe0()
Hardware name: SUN FIRE X4370 M2 SERVER
sysfs: cannot create duplicate filename 
'/devices/pci:00/:00:03.0/:0d:00.0/host6/port-6:1/expander-6:1/port-6:1:14/end_device-6:1:14/target6:0:27/6:0:27:0/enclosure_device:HDD10'
Modules linked in: oracleacfs(P)(U) oracleadvm(P)(U) oracleoks(P)(U) mptctl 
mptbase autofs4 hidp bluetooth rfkill lockd sunrpc bonding be2iscsi 
iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr 
iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp libiscsi 
scsi_transport_iscsi dm_round_robin dm_multipath video sbs sbshc acpi_pad 
acpi_memhotplug acpi_ipmi parport_pc lp parport ipmi_si ipmi_devintf 
ipmi_msghandler sg ses enclosure ixgbe e1000e hwmon igb snd_seq_dummy 
snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss 
snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr i2c_i801 ioatdma 
ghes iTCO_vendor_support hed dca i2c_core i7core_edac edac_core dm_snapshot 
dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage shpchp mpt2sas 
scsi_transport_sas raid_class ahci libahci sd_mod crc_t10dif raid1 ext3 jbd 
mbcache
Pid: 23302, comm: kworker/u:2 Tainted: P2.6.39-400.124.1.el5uek #1
Call Trace:
 [] ? sysfs_add_one+0xbc/0xe0
 [] warn_slowpath_common+0x90/0xc0
 [] warn_slowpath_fmt+0x6e/0x70
 [] ? strlcat+0x54/0x70
 [] sysfs_add_one+0xbc/0xe0
 [] sysfs_do_create_link+0x148/0x1d0
 [] sysfs_create_link+0x13/0x20
 [] enclosure_add_links+0xe7/0x110 [enclosure]
 [] ? kobject_release+0xd/0x10
 [] ? kref_put+0x37/0x70
 [] enclosure_add_device+0x93/0xa0 [enclosure]
 [] ses_enclosure_find_by_addr+0x76/0xc0 [ses]
 [] ? ses_get_fault+0x40/0x40 [ses]
 [] enclosure_for_each_device+0x63/0x90 [enclosure]
 [] ses_match_to_enclosure+0x11a/0x1d0 [ses]
 [] ses_intf_add+0x2c8/0x5c0 [ses]
 [] ? kobject_get+0x1a/0x30
 [] ? add_tail+0x36/0x50
 [] device_add+0x2d4/0x380
 [] scsi_sysfs_add_sdev+0xe6/0x2a0
 [] scsi_add_lun+0x41c/0x560
 [] scsi_probe_and_add_lun+0x1e0/0x3e0
 [] ? default_spin_lock_flags+0x9/0x10
 [] __scsi_scan_target+0xe7/0x120
 [] scsi_scan_target+0xcd/0xf0
 [] sas_rphy_add+0x11b/0x170 [scsi_transport_sas]
 [] mpt2sas_transport_port_add+0x2cf/0x430 [mpt2sas]
 [] _scsih_sas_device_add+0x87/0x110 [mpt2sas]
 [] _scsih_add_device+0x248/0x340 [mpt2sas]
 [] ? mpt2sas_transport_update_links+0xf1/0x190 [mpt2sas]
 [] _scsih_sas_topology_change_event+0x3c6/0x490 [mpt2sas]
 [] ? add_timer+0x18/0x20
 [] ? queue_delayed_work_on+0xc5/0x170
 [] _mpt2sas_fw_work+0x205/0x240 [mpt2sas]
 [] _firmware_event_work_delayed+0x19/0x20 [mpt2sas]
 [] process_one_work+0xf9/0x370
 [] ? _mpt2sas_fw_work+0x240/0x240 [mpt2sas]
 [] worker_thread+0xca/0x240
 [] ? manage_workers+0x90/0x90
 [] kthread+0x97/0xa0
 [] kernel_thread_helper+0x4/0x10
 [] ? kthread_bind+0x80/0x80
 [] ? gs_change+0x13/0x13
---[ end trace 89a1351702ab360f ]---

During our test, multipath used, each LUN has 2 paths. when adding second
path enclousure did not check if will adding device's symlink existed or no.

Cc: James Bottomley 
Signed-off-by: Joe Jin 
---
 drivers/misc/enclosure.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c
index 0e8df41..efc0e86 100644
--- a/drivers/misc/enclosure.c
+++ b/drivers/misc/enclosure.c
@@ -325,6 +325,13 @@ int enclosure_add_device(struct enclosure_device *edev, 
int component,
if (cdev->dev)
enclosure_remove_links(cdev);
 
+   if (dev) {
+   char name[ENCLOSURE_NAME_SIZE];
+
+   enclosure_link_name(cdev, name);
+   sysfs_remove_link(>kobj, name);
+   }
+
put_device(cdev->dev);
cdev->dev = get_device(dev);
return enclosure_add_links(cdev);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [scsi] enclosure: remove all possible sysfs entries before add device

2013-09-09 Thread Joe Jin
On 09/09/13 21:41, Christoph Hellwig wrote:
 Modules linked in: oracleacfs(P)(U) oracleadvm(P)(U) oracleoks(P)(U)
 
 Please reproduce without this weird crap loaded.
 
These modules is filesystem and will not impact enclosure.

Thanks,
Joe




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [scsi] enclosure: remove all possible sysfs entries before add device

2013-09-09 Thread Joe Jin
When do disk pull/insert test we encountered below:

WARNING: at fs/sysfs/dir.c:455 sysfs_add_one+0xbc/0xe0()
Hardware name: SUN FIRE X4370 M2 SERVER
sysfs: cannot create duplicate filename 
'/devices/pci:00/:00:03.0/:0d:00.0/host6/port-6:1/expander-6:1/port-6:1:14/end_device-6:1:14/target6:0:27/6:0:27:0/enclosure_device:HDD10'
Modules linked in: oracleacfs(P)(U) oracleadvm(P)(U) oracleoks(P)(U) mptctl 
mptbase autofs4 hidp bluetooth rfkill lockd sunrpc bonding be2iscsi 
iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr 
iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp libiscsi 
scsi_transport_iscsi dm_round_robin dm_multipath video sbs sbshc acpi_pad 
acpi_memhotplug acpi_ipmi parport_pc lp parport ipmi_si ipmi_devintf 
ipmi_msghandler sg ses enclosure ixgbe e1000e hwmon igb snd_seq_dummy 
snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss 
snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr i2c_i801 ioatdma 
ghes iTCO_vendor_support hed dca i2c_core i7core_edac edac_core dm_snapshot 
dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage shpchp mpt2sas 
scsi_transport_sas raid_class ahci libahci sd_mod crc_t10dif raid1 ext3 jbd 
mbcache
Pid: 23302, comm: kworker/u:2 Tainted: P2.6.39-400.124.1.el5uek #1
Call Trace:
 [811daf8c] ? sysfs_add_one+0xbc/0xe0
 [8106f030] warn_slowpath_common+0x90/0xc0
 [8106f15e] warn_slowpath_fmt+0x6e/0x70
 [81258bd4] ? strlcat+0x54/0x70
 [811daf8c] sysfs_add_one+0xbc/0xe0
 [811dbec8] sysfs_do_create_link+0x148/0x1d0
 [811dbf83] sysfs_create_link+0x13/0x20
 [a00de307] enclosure_add_links+0xe7/0x110 [enclosure]
 [8125325d] ? kobject_release+0xd/0x10
 [812549e7] ? kref_put+0x37/0x70
 [a00de3c3] enclosure_add_device+0x93/0xa0 [enclosure]
 [a00c8666] ses_enclosure_find_by_addr+0x76/0xc0 [ses]
 [a00c85f0] ? ses_get_fault+0x40/0x40 [ses]
 [a00de433] enclosure_for_each_device+0x63/0x90 [enclosure]
 [a00c8a8a] ses_match_to_enclosure+0x11a/0x1d0 [ses]
 [a00c8e08] ses_intf_add+0x2c8/0x5c0 [ses]
 [8125327a] ? kobject_get+0x1a/0x30
 [814e8b56] ? add_tail+0x36/0x50
 [81345ae4] device_add+0x2d4/0x380
 [8136b096] scsi_sysfs_add_sdev+0xe6/0x2a0
 [813682cc] scsi_add_lun+0x41c/0x560
 [81368a80] scsi_probe_and_add_lun+0x1e0/0x3e0
 [81041009] ? default_spin_lock_flags+0x9/0x10
 [813696e7] __scsi_scan_target+0xe7/0x120
 [81369b8d] scsi_scan_target+0xcd/0xf0
 [a003faab] sas_rphy_add+0x11b/0x170 [scsi_transport_sas]
 [a009a74f] mpt2sas_transport_port_add+0x2cf/0x430 [mpt2sas]
 [a008d437] _scsih_sas_device_add+0x87/0x110 [mpt2sas]
 [a0094eb8] _scsih_add_device+0x248/0x340 [mpt2sas]
 [a0098cb1] ? mpt2sas_transport_update_links+0xf1/0x190 [mpt2sas]
 [a00977b6] _scsih_sas_topology_change_event+0x3c6/0x490 [mpt2sas]
 [81080698] ? add_timer+0x18/0x20
 [8108a405] ? queue_delayed_work_on+0xc5/0x170
 [a0097a85] _mpt2sas_fw_work+0x205/0x240 [mpt2sas]
 [a0097ad9] _firmware_event_work_delayed+0x19/0x20 [mpt2sas]
 [8108c0d9] process_one_work+0xf9/0x370
 [a0097ac0] ? _mpt2sas_fw_work+0x240/0x240 [mpt2sas]
 [8108ca1a] worker_thread+0xca/0x240
 [8108c950] ? manage_workers+0x90/0x90
 [81090ff7] kthread+0x97/0xa0
 [8150fdc4] kernel_thread_helper+0x4/0x10
 [81090f60] ? kthread_bind+0x80/0x80
 [8150fdc0] ? gs_change+0x13/0x13
---[ end trace 89a1351702ab360f ]---

During our test, multipath used, each LUN has 2 paths. when adding second
path enclousure did not check if will adding device's symlink existed or no.

Cc: James Bottomley james.bottom...@hansenpartnership.com
Signed-off-by: Joe Jin joe@oracle.com
---
 drivers/misc/enclosure.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c
index 0e8df41..efc0e86 100644
--- a/drivers/misc/enclosure.c
+++ b/drivers/misc/enclosure.c
@@ -325,6 +325,13 @@ int enclosure_add_device(struct enclosure_device *edev, 
int component,
if (cdev-dev)
enclosure_remove_links(cdev);
 
+   if (dev) {
+   char name[ENCLOSURE_NAME_SIZE];
+
+   enclosure_link_name(cdev, name);
+   sysfs_remove_link(dev-kobj, name);
+   }
+
put_device(cdev-dev);
cdev-dev = get_device(dev);
return enclosure_add_links(cdev);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] dm: allow error target to replace either bio-based and request-based targets

2013-08-22 Thread Joe Jin
On 08/23/13 08:17, Mike Snitzer wrote:
> Here is a patch that should work for your needs (I tested it to work
> with 'dmsetup wipe_table' on both request-based and bio-based devices):

This really what I looking for, thanks!

> 
> From: Mike Snitzer 
> Date: Thu, 22 Aug 2013 18:21:38 -0400
> Subject: [PATCH] dm: allow error target to replace either bio-based and 
> request-based targets
> 
> In may be useful to switch a request-based table to the "error" target.
> Enhance the DM core to allow a single hybrid target to be capable of
> handling either bios or requests.
> 
> Add a request-based (.map_rq) member to the error target_type and train
> dm_table_set_type() to prefer the md's established type (request-based
> or bio-based).  If the md doesn't have an established type default to
> making the hybrid target bio-based.

Signed-off-by: Joe Jin 

Thanks,
Joe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] dm: allow error target to replace either bio-based and request-based targets

2013-08-22 Thread Joe Jin
On 08/23/13 08:17, Mike Snitzer wrote:
 Here is a patch that should work for your needs (I tested it to work
 with 'dmsetup wipe_table' on both request-based and bio-based devices):

This really what I looking for, thanks!

 
 From: Mike Snitzer snit...@redhat.com
 Date: Thu, 22 Aug 2013 18:21:38 -0400
 Subject: [PATCH] dm: allow error target to replace either bio-based and 
 request-based targets
 
 In may be useful to switch a request-based table to the error target.
 Enhance the DM core to allow a single hybrid target to be capable of
 handling either bios or requests.
 
 Add a request-based (.map_rq) member to the error target_type and train
 dm_table_set_type() to prefer the md's established type (request-based
 or bio-based).  If the md doesn't have an established type default to
 making the hybrid target bio-based.

Signed-off-by: Joe Jin joe@oracle.com

Thanks,
Joe
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v2] dm ioctl: allow change device target type to error

2013-08-21 Thread Joe Jin
Mikulas, thanks for you suggestions, I create new patch, can you please help
review?

Subject: dm: add map_rq define for error

commit a5664da "dm ioctl: make bio or request based device type immutable"
prevented "dmsetup wape_table" change the target type to "error" for there
is not map_rq for error target type.

Signed-off-by: Joe Jin 
---
 drivers/md/dm-target.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/md/dm-target.c b/drivers/md/dm-target.c
index 37ba5db..b690910 100644
--- a/drivers/md/dm-target.c
+++ b/drivers/md/dm-target.c
@@ -131,12 +131,19 @@ static int io_err_map(struct dm_target *tt, struct bio 
*bio)
return -EIO;
 }
 
+static int io_err_map_rq(struct dm_target *ti, struct request *clone,
+union map_info *map_context)
+{
+   return -EIO;
+}
+
 static struct target_type error_target = {
.name = "error",
.version = {1, 1, 0},
.ctr  = io_err_ctr,
.dtr  = io_err_dtr,
.map  = io_err_map,
+   .map_rq = io_err_map_rq,
 };
 
 int __init dm_target_init(void)
-- 
1.8.3.1

On 08/21/13 22:48, Mikulas Patocka wrote:
> 
> 
> On Wed, 21 Aug 2013, Joe Jin wrote:
> 
>> commit a5664da "dm ioctl: make bio or request based device type immutable"
>> prevented "dmsetup wape_table" change the target type to "error".
> 
> That commit a5664da is there for a reason (it is not possible to change 
> bio-based device to request-based and vice versa) and I don't really see 
> how this patch is supposed to work.
> 
> If there are bios that are in flight and that already passed through 
> blk_queue_bio, and you change the device from request-based to bio-based, 
> what are you going to do with them? - The patch doesn't do anything about 
> it.
> 
> A better approach would be to create a new request-based target "error-rq" 
> and change the multipath target to "error-rq" target. That way, you don't 
> have to change device type from request based to bio based.
> 
> Mikulas
> 
>> -v2: setup md->queue even target type is "error".
>>
>> Signed-off-by: Joe Jin 
>> ---
>>  drivers/md/dm-ioctl.c |  4 
>>  drivers/md/dm-table.c | 12 
>>  drivers/md/dm.h   |  1 +
>>  3 files changed, 17 insertions(+)
>>
>> diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
>> index f1b7586..2a9b63d 100644
>> --- a/drivers/md/dm-ioctl.c
>> +++ b/drivers/md/dm-ioctl.c
>> @@ -1280,6 +1280,9 @@ static int table_load(struct dm_ioctl *param, size_t 
>> param_size)
>>  goto out;
>>  }
>>  
>> +if (dm_is_error_target(t))
>> +goto error_target;
>> +
>>  /* Protect md->type and md->queue against concurrent table loads. */
>>  dm_lock_md_type(md);
>>  if (dm_get_md_type(md) == DM_TYPE_NONE)
>> @@ -1293,6 +1296,7 @@ static int table_load(struct dm_ioctl *param, size_t 
>> param_size)
>>  goto out;
>>  }
>>  
>> +error_target:
>>  /* setup md->queue to reflect md's type (may block) */
>>  r = dm_setup_md_queue(md);
>>  if (r) {
>> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
>> index f221812..27be46a 100644
>> --- a/drivers/md/dm-table.c
>> +++ b/drivers/md/dm-table.c
>> @@ -184,6 +184,18 @@ static int alloc_targets(struct dm_table *t, unsigned 
>> int num)
>>  return 0;
>>  }
>>  
>> +bool dm_is_error_target(struct dm_table *t)
>> +{
>> +unsigned i;
>> +
>> +for (i = 0; i < t->num_targets; i++) {
>> +struct dm_target *tgt = t->targets + i;
>> +if (strcmp(tgt->type->name, "error") == 0)
>> +return true;
>> +}
>> +return false;
>> +}
>> +
>>  int dm_table_create(struct dm_table **result, fmode_t mode,
>>  unsigned num_targets, struct mapped_device *md)
>>  {
>> diff --git a/drivers/md/dm.h b/drivers/md/dm.h
>> index 45b97da..c7bceeb 100644
>> --- a/drivers/md/dm.h
>> +++ b/drivers/md/dm.h
>> @@ -69,6 +69,7 @@ unsigned dm_table_get_type(struct dm_table *t);
>>  struct target_type *dm_table_get_immutable_target_type(struct dm_table *t);
>>  bool dm_table_request_based(struct dm_table *t);
>>  bool dm_table_supports_discards(struct dm_table *t);
>> +bool dm_is_error_target(struct dm_table *t);
>>  int dm_table_alloc_md_mempools(struct dm_table *t);
>>  void dm_table_free_md_mempools(struct dm_table *t);
>>  struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t);
>> -- 
>> 1.8.3.1
>>
>> --
>> dm-devel mailing list
>> dm-de...@redhat.com
>> https://www.redhat.com/mailman/listinfo/dm-devel
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] dm ioctl: allow change device target type to error

2013-08-21 Thread Joe Jin
On 08/21/13 23:06, Mike Snitzer wrote:
> On Wed, Aug 21 2013 at 10:48am -0400,
> Mikulas Patocka  wrote:
> 
>>
>>
>> On Wed, 21 Aug 2013, Joe Jin wrote:
>>
>>> commit a5664da "dm ioctl: make bio or request based device type immutable"
>>> prevented "dmsetup wape_table" change the target type to "error".
>>
>> That commit a5664da is there for a reason (it is not possible to change 
>> bio-based device to request-based and vice versa) and I don't really see 
>> how this patch is supposed to work.
>>
>> If there are bios that are in flight and that already passed through 
>> blk_queue_bio, and you change the device from request-based to bio-based, 
>> what are you going to do with them? - The patch doesn't do anything about 
>> it.
>>
>> A better approach would be to create a new request-based target "error-rq" 
>> and change the multipath target to "error-rq" target. That way, you don't 
>> have to change device type from request based to bio based.
> 
> My thoughts _exactly_.  This patch is very confused.
> 
> Joe, what are you looking to be able to do?  Switch a dm-multipath
> device to error?  Or allowing switching a target that has
> DM_TARGET_IMMUTABLE flag set to be switched to error target?
> 
> The latter restriction was introduced with commit 36a0456fb ("dm table:
> add immutable feature").

Hi Mike,

So far dmsetup support wipe_table:
https://bugzilla.redhat.com/show_bug.cgi?id=742607
As description in the bug Doc Text, "This could be useful, for example, 
if a long-running process keeps a device open after it has finished using
it and you need to release the underlying devices before that process exits."

After apply the commit, wipe_table no long works.

Thanks,
Joe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] dm ioctl: allow change device target type to error

2013-08-21 Thread Joe Jin
commit a5664da "dm ioctl: make bio or request based device type immutable"
prevented "dmsetup wape_table" change the target type to "error".

-v2: setup md->queue even target type is "error".

Signed-off-by: Joe Jin 
---
 drivers/md/dm-ioctl.c |  4 
 drivers/md/dm-table.c | 12 
 drivers/md/dm.h   |  1 +
 3 files changed, 17 insertions(+)

diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index f1b7586..2a9b63d 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1280,6 +1280,9 @@ static int table_load(struct dm_ioctl *param, size_t 
param_size)
goto out;
}
 
+   if (dm_is_error_target(t))
+   goto error_target;
+
/* Protect md->type and md->queue against concurrent table loads. */
dm_lock_md_type(md);
if (dm_get_md_type(md) == DM_TYPE_NONE)
@@ -1293,6 +1296,7 @@ static int table_load(struct dm_ioctl *param, size_t 
param_size)
goto out;
}
 
+error_target:
/* setup md->queue to reflect md's type (may block) */
r = dm_setup_md_queue(md);
if (r) {
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index f221812..27be46a 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -184,6 +184,18 @@ static int alloc_targets(struct dm_table *t, unsigned int 
num)
return 0;
 }
 
+bool dm_is_error_target(struct dm_table *t)
+{
+   unsigned i;
+
+   for (i = 0; i < t->num_targets; i++) {
+   struct dm_target *tgt = t->targets + i;
+   if (strcmp(tgt->type->name, "error") == 0)
+   return true;
+   }
+   return false;
+}
+
 int dm_table_create(struct dm_table **result, fmode_t mode,
unsigned num_targets, struct mapped_device *md)
 {
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index 45b97da..c7bceeb 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -69,6 +69,7 @@ unsigned dm_table_get_type(struct dm_table *t);
 struct target_type *dm_table_get_immutable_target_type(struct dm_table *t);
 bool dm_table_request_based(struct dm_table *t);
 bool dm_table_supports_discards(struct dm_table *t);
+bool dm_is_error_target(struct dm_table *t);
 int dm_table_alloc_md_mempools(struct dm_table *t);
 void dm_table_free_md_mempools(struct dm_table *t);
 struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] dm ioctl: allow change device target type to error

2013-08-21 Thread Joe Jin
commit a5664da "dm ioctl: make bio or request based device type immutable"
prevented "dmsetup wape_table" change the target type to "error".

Signed-off-by: Joe Jin 
---
 drivers/md/dm-ioctl.c |  6 +-
 drivers/md/dm-table.c | 12 
 drivers/md/dm.h   |  1 +
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index f1b7586..1ee9e41 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1250,7 +1250,7 @@ static int populate_table(struct dm_table *table,
 
 static int table_load(struct dm_ioctl *param, size_t param_size)
 {
-   int r;
+   int r = 0;
struct hash_cell *hc;
struct dm_table *t, *old_map = NULL;
struct mapped_device *md;
@@ -1280,6 +1280,9 @@ static int table_load(struct dm_ioctl *param, size_t 
param_size)
goto out;
}
 
+   if (dm_is_error_target(t))
+   goto error_target;
+
/* Protect md->type and md->queue against concurrent table loads. */
dm_lock_md_type(md);
if (dm_get_md_type(md) == DM_TYPE_NONE)
@@ -1303,6 +1306,7 @@ static int table_load(struct dm_ioctl *param, size_t 
param_size)
}
dm_unlock_md_type(md);
 
+error_target:
/* stage inactive table */
down_write(&_hash_lock);
hc = dm_get_mdptr(md);
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index f221812..27be46a 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -184,6 +184,18 @@ static int alloc_targets(struct dm_table *t, unsigned int 
num)
return 0;
 }
 
+bool dm_is_error_target(struct dm_table *t)
+{
+   unsigned i;
+
+   for (i = 0; i < t->num_targets; i++) {
+   struct dm_target *tgt = t->targets + i;
+   if (strcmp(tgt->type->name, "error") == 0)
+   return true;
+   }
+   return false;
+}
+
 int dm_table_create(struct dm_table **result, fmode_t mode,
unsigned num_targets, struct mapped_device *md)
 {
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index 45b97da..c7bceeb 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -69,6 +69,7 @@ unsigned dm_table_get_type(struct dm_table *t);
 struct target_type *dm_table_get_immutable_target_type(struct dm_table *t);
 bool dm_table_request_based(struct dm_table *t);
 bool dm_table_supports_discards(struct dm_table *t);
+bool dm_is_error_target(struct dm_table *t);
 int dm_table_alloc_md_mempools(struct dm_table *t);
 void dm_table_free_md_mempools(struct dm_table *t);
 struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] dm ioctl: allow change device target type to error

2013-08-21 Thread Joe Jin
commit a5664da dm ioctl: make bio or request based device type immutable
prevented dmsetup wape_table change the target type to error.

Signed-off-by: Joe Jin joe@oracle.com
---
 drivers/md/dm-ioctl.c |  6 +-
 drivers/md/dm-table.c | 12 
 drivers/md/dm.h   |  1 +
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index f1b7586..1ee9e41 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1250,7 +1250,7 @@ static int populate_table(struct dm_table *table,
 
 static int table_load(struct dm_ioctl *param, size_t param_size)
 {
-   int r;
+   int r = 0;
struct hash_cell *hc;
struct dm_table *t, *old_map = NULL;
struct mapped_device *md;
@@ -1280,6 +1280,9 @@ static int table_load(struct dm_ioctl *param, size_t 
param_size)
goto out;
}
 
+   if (dm_is_error_target(t))
+   goto error_target;
+
/* Protect md-type and md-queue against concurrent table loads. */
dm_lock_md_type(md);
if (dm_get_md_type(md) == DM_TYPE_NONE)
@@ -1303,6 +1306,7 @@ static int table_load(struct dm_ioctl *param, size_t 
param_size)
}
dm_unlock_md_type(md);
 
+error_target:
/* stage inactive table */
down_write(_hash_lock);
hc = dm_get_mdptr(md);
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index f221812..27be46a 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -184,6 +184,18 @@ static int alloc_targets(struct dm_table *t, unsigned int 
num)
return 0;
 }
 
+bool dm_is_error_target(struct dm_table *t)
+{
+   unsigned i;
+
+   for (i = 0; i  t-num_targets; i++) {
+   struct dm_target *tgt = t-targets + i;
+   if (strcmp(tgt-type-name, error) == 0)
+   return true;
+   }
+   return false;
+}
+
 int dm_table_create(struct dm_table **result, fmode_t mode,
unsigned num_targets, struct mapped_device *md)
 {
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index 45b97da..c7bceeb 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -69,6 +69,7 @@ unsigned dm_table_get_type(struct dm_table *t);
 struct target_type *dm_table_get_immutable_target_type(struct dm_table *t);
 bool dm_table_request_based(struct dm_table *t);
 bool dm_table_supports_discards(struct dm_table *t);
+bool dm_is_error_target(struct dm_table *t);
 int dm_table_alloc_md_mempools(struct dm_table *t);
 void dm_table_free_md_mempools(struct dm_table *t);
 struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] dm ioctl: allow change device target type to error

2013-08-21 Thread Joe Jin
commit a5664da dm ioctl: make bio or request based device type immutable
prevented dmsetup wape_table change the target type to error.

-v2: setup md-queue even target type is error.

Signed-off-by: Joe Jin joe@oracle.com
---
 drivers/md/dm-ioctl.c |  4 
 drivers/md/dm-table.c | 12 
 drivers/md/dm.h   |  1 +
 3 files changed, 17 insertions(+)

diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index f1b7586..2a9b63d 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1280,6 +1280,9 @@ static int table_load(struct dm_ioctl *param, size_t 
param_size)
goto out;
}
 
+   if (dm_is_error_target(t))
+   goto error_target;
+
/* Protect md-type and md-queue against concurrent table loads. */
dm_lock_md_type(md);
if (dm_get_md_type(md) == DM_TYPE_NONE)
@@ -1293,6 +1296,7 @@ static int table_load(struct dm_ioctl *param, size_t 
param_size)
goto out;
}
 
+error_target:
/* setup md-queue to reflect md's type (may block) */
r = dm_setup_md_queue(md);
if (r) {
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index f221812..27be46a 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -184,6 +184,18 @@ static int alloc_targets(struct dm_table *t, unsigned int 
num)
return 0;
 }
 
+bool dm_is_error_target(struct dm_table *t)
+{
+   unsigned i;
+
+   for (i = 0; i  t-num_targets; i++) {
+   struct dm_target *tgt = t-targets + i;
+   if (strcmp(tgt-type-name, error) == 0)
+   return true;
+   }
+   return false;
+}
+
 int dm_table_create(struct dm_table **result, fmode_t mode,
unsigned num_targets, struct mapped_device *md)
 {
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index 45b97da..c7bceeb 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -69,6 +69,7 @@ unsigned dm_table_get_type(struct dm_table *t);
 struct target_type *dm_table_get_immutable_target_type(struct dm_table *t);
 bool dm_table_request_based(struct dm_table *t);
 bool dm_table_supports_discards(struct dm_table *t);
+bool dm_is_error_target(struct dm_table *t);
 int dm_table_alloc_md_mempools(struct dm_table *t);
 void dm_table_free_md_mempools(struct dm_table *t);
 struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] dm ioctl: allow change device target type to error

2013-08-21 Thread Joe Jin
On 08/21/13 23:06, Mike Snitzer wrote:
 On Wed, Aug 21 2013 at 10:48am -0400,
 Mikulas Patocka mpato...@redhat.com wrote:
 


 On Wed, 21 Aug 2013, Joe Jin wrote:

 commit a5664da dm ioctl: make bio or request based device type immutable
 prevented dmsetup wape_table change the target type to error.

 That commit a5664da is there for a reason (it is not possible to change 
 bio-based device to request-based and vice versa) and I don't really see 
 how this patch is supposed to work.

 If there are bios that are in flight and that already passed through 
 blk_queue_bio, and you change the device from request-based to bio-based, 
 what are you going to do with them? - The patch doesn't do anything about 
 it.

 A better approach would be to create a new request-based target error-rq 
 and change the multipath target to error-rq target. That way, you don't 
 have to change device type from request based to bio based.
 
 My thoughts _exactly_.  This patch is very confused.
 
 Joe, what are you looking to be able to do?  Switch a dm-multipath
 device to error?  Or allowing switching a target that has
 DM_TARGET_IMMUTABLE flag set to be switched to error target?
 
 The latter restriction was introduced with commit 36a0456fb (dm table:
 add immutable feature).

Hi Mike,

So far dmsetup support wipe_table:
https://bugzilla.redhat.com/show_bug.cgi?id=742607
As description in the bug Doc Text, This could be useful, for example, 
if a long-running process keeps a device open after it has finished using
it and you need to release the underlying devices before that process exits.

After apply the commit, wipe_table no long works.

Thanks,
Joe
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v2] dm ioctl: allow change device target type to error

2013-08-21 Thread Joe Jin
Mikulas, thanks for you suggestions, I create new patch, can you please help
review?

Subject: dm: add map_rq define for error

commit a5664da dm ioctl: make bio or request based device type immutable
prevented dmsetup wape_table change the target type to error for there
is not map_rq for error target type.

Signed-off-by: Joe Jin joe@oracle.com
---
 drivers/md/dm-target.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/md/dm-target.c b/drivers/md/dm-target.c
index 37ba5db..b690910 100644
--- a/drivers/md/dm-target.c
+++ b/drivers/md/dm-target.c
@@ -131,12 +131,19 @@ static int io_err_map(struct dm_target *tt, struct bio 
*bio)
return -EIO;
 }
 
+static int io_err_map_rq(struct dm_target *ti, struct request *clone,
+union map_info *map_context)
+{
+   return -EIO;
+}
+
 static struct target_type error_target = {
.name = error,
.version = {1, 1, 0},
.ctr  = io_err_ctr,
.dtr  = io_err_dtr,
.map  = io_err_map,
+   .map_rq = io_err_map_rq,
 };
 
 int __init dm_target_init(void)
-- 
1.8.3.1

On 08/21/13 22:48, Mikulas Patocka wrote:
 
 
 On Wed, 21 Aug 2013, Joe Jin wrote:
 
 commit a5664da dm ioctl: make bio or request based device type immutable
 prevented dmsetup wape_table change the target type to error.
 
 That commit a5664da is there for a reason (it is not possible to change 
 bio-based device to request-based and vice versa) and I don't really see 
 how this patch is supposed to work.
 
 If there are bios that are in flight and that already passed through 
 blk_queue_bio, and you change the device from request-based to bio-based, 
 what are you going to do with them? - The patch doesn't do anything about 
 it.
 
 A better approach would be to create a new request-based target error-rq 
 and change the multipath target to error-rq target. That way, you don't 
 have to change device type from request based to bio based.
 
 Mikulas
 
 -v2: setup md-queue even target type is error.

 Signed-off-by: Joe Jin joe@oracle.com
 ---
  drivers/md/dm-ioctl.c |  4 
  drivers/md/dm-table.c | 12 
  drivers/md/dm.h   |  1 +
  3 files changed, 17 insertions(+)

 diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
 index f1b7586..2a9b63d 100644
 --- a/drivers/md/dm-ioctl.c
 +++ b/drivers/md/dm-ioctl.c
 @@ -1280,6 +1280,9 @@ static int table_load(struct dm_ioctl *param, size_t 
 param_size)
  goto out;
  }
  
 +if (dm_is_error_target(t))
 +goto error_target;
 +
  /* Protect md-type and md-queue against concurrent table loads. */
  dm_lock_md_type(md);
  if (dm_get_md_type(md) == DM_TYPE_NONE)
 @@ -1293,6 +1296,7 @@ static int table_load(struct dm_ioctl *param, size_t 
 param_size)
  goto out;
  }
  
 +error_target:
  /* setup md-queue to reflect md's type (may block) */
  r = dm_setup_md_queue(md);
  if (r) {
 diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
 index f221812..27be46a 100644
 --- a/drivers/md/dm-table.c
 +++ b/drivers/md/dm-table.c
 @@ -184,6 +184,18 @@ static int alloc_targets(struct dm_table *t, unsigned 
 int num)
  return 0;
  }
  
 +bool dm_is_error_target(struct dm_table *t)
 +{
 +unsigned i;
 +
 +for (i = 0; i  t-num_targets; i++) {
 +struct dm_target *tgt = t-targets + i;
 +if (strcmp(tgt-type-name, error) == 0)
 +return true;
 +}
 +return false;
 +}
 +
  int dm_table_create(struct dm_table **result, fmode_t mode,
  unsigned num_targets, struct mapped_device *md)
  {
 diff --git a/drivers/md/dm.h b/drivers/md/dm.h
 index 45b97da..c7bceeb 100644
 --- a/drivers/md/dm.h
 +++ b/drivers/md/dm.h
 @@ -69,6 +69,7 @@ unsigned dm_table_get_type(struct dm_table *t);
  struct target_type *dm_table_get_immutable_target_type(struct dm_table *t);
  bool dm_table_request_based(struct dm_table *t);
  bool dm_table_supports_discards(struct dm_table *t);
 +bool dm_is_error_target(struct dm_table *t);
  int dm_table_alloc_md_mempools(struct dm_table *t);
  void dm_table_free_md_mempools(struct dm_table *t);
  struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t);
 -- 
 1.8.3.1

 --
 dm-devel mailing list
 dm-de...@redhat.com
 https://www.redhat.com/mailman/listinfo/dm-devel

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen: initialize xen panic handler for PVHVM

2013-08-19 Thread Joe Jin
On 08/16/13 20:43, Konrad Rzeszutek Wilk wrote:
> Could you tell me what has been happening without this patch?

Without this patch, Xen would not get pvhvm crash event, any config for
on_crash in guest configure file will not be triggered.

Thanks,
Joe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen: initialize xen panic handler for PVHVM

2013-08-19 Thread Joe Jin
On 08/16/13 20:43, Konrad Rzeszutek Wilk wrote:
 Could you tell me what has been happening without this patch?

Without this patch, Xen would not get pvhvm crash event, any config for
on_crash in guest configure file will not be triggered.

Thanks,
Joe

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel panic in skb_copy_bits

2013-07-04 Thread Joe Jin
On 07/01/13 16:11, Ian Campbell wrote:
> On Mon, 2013-07-01 at 11:18 +0800, Joe Jin wrote:
>>> A workaround is to turn off O_DIRECT use by Xen as that ensures
>>> the pages are copied. Xen 4.3 does this by default.
>>>
>>> I believe fixes for this are in 4.3 and 4.2.2 if using the
>>> qemu upstream DM. Note these aren't real fixes, just a workaround
>>> of a kernel bug.
>>
>> The guest is pvm, and disk model is xvbd, guest config file as below:
> 
> Do you know which disk backend? The workaround Alex refers to went into
> qdisk but I think blkback could still suffer from a variant of the
> retransmit issue if you run it over iSCSI.
> 
>>> To fix on a local build of xen you will need something like this:
>>> https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9
>>> and something like this (NB: obviously insert your own git
>>> repo and commit numbers)
>>> https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca
>>>
>>
>> I think this only for pvhvm/hvm?
> 
> No, the underlying issue affects any PV device which is run over a
> network protocol (NFS, iSCSI etc). In effect a delayed retransmit can
> cross over the deayed ack and cause I/O to be completed while
> retransmits are pending, such as is described in
> http://www.spinics.net/lists/linux-nfs/msg34913.html (the original NFS
> variant). The problem is that because Xen PV drivers often unmap the
> page on I/O completion you get a crash (page fault) on the retransmit.
> 

Can we do it by remember grant page refcount when mapping, and when unmap
check if page refcount as same as mapping?  This change will limited in
xen-blkback.

Another way is add new page flag like PG_send, when sendpage() be called,
set the bit, when page be put, clear the bit. Then xen-blkback can wait
on the pagequeue.

Thanks,
Joe

> The issue also affects native but in that case the symptom is "just" a
> corrupt packet on the wire. I tried to address this with my "skb
> destructor" series but unfortunately I got bogged down on the details,
> then I had to take time out to look into some other stuff and never
> managed to get back into it. I'd be very grateful if there was someone
> who could pick up that work (Alex gave some useful references in another
> reply to this thread)
> 
> Some PV disk backends (e.g. blktap2) have worked around this by using
> grant copy instead of grant map, others (e.g. qdisk) have disabled
> O_DIRECT so that the pages are copied into the dom0 page cache and
> transmitted from there.
> 
> We were discussing recently the possibility of mapping all ballooned out
> pages to a single read-only scratch page instead of leaving them empty
> in the page tables, this would cause the Xen case to revert to the
> native case. I think Thanos was going to take a look into this.
> 
> Ian.
> 


-- 
Oracle <http://www.oracle.com>
Joe Jin | Software Development Senior Manager | +8610.6106.5624
ORACLE | Linux and Virtualization
No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel panic in skb_copy_bits

2013-07-04 Thread Joe Jin
On 07/01/13 16:11, Ian Campbell wrote:
 On Mon, 2013-07-01 at 11:18 +0800, Joe Jin wrote:
 A workaround is to turn off O_DIRECT use by Xen as that ensures
 the pages are copied. Xen 4.3 does this by default.

 I believe fixes for this are in 4.3 and 4.2.2 if using the
 qemu upstream DM. Note these aren't real fixes, just a workaround
 of a kernel bug.

 The guest is pvm, and disk model is xvbd, guest config file as below:
 
 Do you know which disk backend? The workaround Alex refers to went into
 qdisk but I think blkback could still suffer from a variant of the
 retransmit issue if you run it over iSCSI.
 
 To fix on a local build of xen you will need something like this:
 https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9
 and something like this (NB: obviously insert your own git
 repo and commit numbers)
 https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca


 I think this only for pvhvm/hvm?
 
 No, the underlying issue affects any PV device which is run over a
 network protocol (NFS, iSCSI etc). In effect a delayed retransmit can
 cross over the deayed ack and cause I/O to be completed while
 retransmits are pending, such as is described in
 http://www.spinics.net/lists/linux-nfs/msg34913.html (the original NFS
 variant). The problem is that because Xen PV drivers often unmap the
 page on I/O completion you get a crash (page fault) on the retransmit.
 

Can we do it by remember grant page refcount when mapping, and when unmap
check if page refcount as same as mapping?  This change will limited in
xen-blkback.

Another way is add new page flag like PG_send, when sendpage() be called,
set the bit, when page be put, clear the bit. Then xen-blkback can wait
on the pagequeue.

Thanks,
Joe

 The issue also affects native but in that case the symptom is just a
 corrupt packet on the wire. I tried to address this with my skb
 destructor series but unfortunately I got bogged down on the details,
 then I had to take time out to look into some other stuff and never
 managed to get back into it. I'd be very grateful if there was someone
 who could pick up that work (Alex gave some useful references in another
 reply to this thread)
 
 Some PV disk backends (e.g. blktap2) have worked around this by using
 grant copy instead of grant map, others (e.g. qdisk) have disabled
 O_DIRECT so that the pages are copied into the dom0 page cache and
 transmitted from there.
 
 We were discussing recently the possibility of mapping all ballooned out
 pages to a single read-only scratch page instead of leaving them empty
 in the page tables, this would cause the Xen case to revert to the
 native case. I think Thanos was going to take a look into this.
 
 Ian.
 


-- 
Oracle http://www.oracle.com
Joe Jin | Software Development Senior Manager | +8610.6106.5624
ORACLE | Linux and Virtualization
No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel panic in skb_copy_bits

2013-07-01 Thread Joe Jin
On 07/01/13 16:11, Ian Campbell wrote:
> On Mon, 2013-07-01 at 11:18 +0800, Joe Jin wrote:
>>> A workaround is to turn off O_DIRECT use by Xen as that ensures
>>> the pages are copied. Xen 4.3 does this by default.
>>>
>>> I believe fixes for this are in 4.3 and 4.2.2 if using the
>>> qemu upstream DM. Note these aren't real fixes, just a workaround
>>> of a kernel bug.
>>
>> The guest is pvm, and disk model is xvbd, guest config file as below:
> 
> Do you know which disk backend? The workaround Alex refers to went into
> qdisk but I think blkback could still suffer from a variant of the
> retransmit issue if you run it over iSCSI.

The backend is xen-blkback on iSCSI storage.
 
> 
>>> To fix on a local build of xen you will need something like this:
>>> https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9
>>> and something like this (NB: obviously insert your own git
>>> repo and commit numbers)
>>> https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca
>>>
>>
>> I think this only for pvhvm/hvm?
> 
> No, the underlying issue affects any PV device which is run over a
> network protocol (NFS, iSCSI etc). In effect a delayed retransmit can
> cross over the deayed ack and cause I/O to be completed while
> retransmits are pending, such as is described in
> http://www.spinics.net/lists/linux-nfs/msg34913.html (the original NFS
> variant). The problem is that because Xen PV drivers often unmap the
> page on I/O completion you get a crash (page fault) on the retransmit.
> 
To prevent iSCSI call sendpage() reuse the page we disabled the sg from NIC,
per test result the panic went. This also confirmed the page be unmpped by
grant system, the symptom as same as nfs panic.


> The issue also affects native but in that case the symptom is "just" a
> corrupt packet on the wire. I tried to address this with my "skb
> destructor" series but unfortunately I got bogged down on the details,
> then I had to take time out to look into some other stuff and never
> managed to get back into it. I'd be very grateful if there was someone
> who could pick up that work (Alex gave some useful references in another
> reply to this thread)
> 
> Some PV disk backends (e.g. blktap2) have worked around this by using
> grant copy instead of grant map, others (e.g. qdisk) have disabled
> O_DIRECT so that the pages are copied into the dom0 page cache and
> transmitted from there.

The work around as same as we disable sg from NIC(disable it sendpage will
create own page copy rather than reuse the page).

Thanks,
Joe
> 
> We were discussing recently the possibility of mapping all ballooned out
> pages to a single read-only scratch page instead of leaving them empty
> in the page tables, this would cause the Xen case to revert to the
> native case. I think Thanos was going to take a look into this.
> 
> Ian.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel panic in skb_copy_bits

2013-07-01 Thread Joe Jin
On 07/01/13 16:11, Ian Campbell wrote:
 On Mon, 2013-07-01 at 11:18 +0800, Joe Jin wrote:
 A workaround is to turn off O_DIRECT use by Xen as that ensures
 the pages are copied. Xen 4.3 does this by default.

 I believe fixes for this are in 4.3 and 4.2.2 if using the
 qemu upstream DM. Note these aren't real fixes, just a workaround
 of a kernel bug.

 The guest is pvm, and disk model is xvbd, guest config file as below:
 
 Do you know which disk backend? The workaround Alex refers to went into
 qdisk but I think blkback could still suffer from a variant of the
 retransmit issue if you run it over iSCSI.

The backend is xen-blkback on iSCSI storage.
 
 
 To fix on a local build of xen you will need something like this:
 https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9
 and something like this (NB: obviously insert your own git
 repo and commit numbers)
 https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca


 I think this only for pvhvm/hvm?
 
 No, the underlying issue affects any PV device which is run over a
 network protocol (NFS, iSCSI etc). In effect a delayed retransmit can
 cross over the deayed ack and cause I/O to be completed while
 retransmits are pending, such as is described in
 http://www.spinics.net/lists/linux-nfs/msg34913.html (the original NFS
 variant). The problem is that because Xen PV drivers often unmap the
 page on I/O completion you get a crash (page fault) on the retransmit.
 
To prevent iSCSI call sendpage() reuse the page we disabled the sg from NIC,
per test result the panic went. This also confirmed the page be unmpped by
grant system, the symptom as same as nfs panic.


 The issue also affects native but in that case the symptom is just a
 corrupt packet on the wire. I tried to address this with my skb
 destructor series but unfortunately I got bogged down on the details,
 then I had to take time out to look into some other stuff and never
 managed to get back into it. I'd be very grateful if there was someone
 who could pick up that work (Alex gave some useful references in another
 reply to this thread)
 
 Some PV disk backends (e.g. blktap2) have worked around this by using
 grant copy instead of grant map, others (e.g. qdisk) have disabled
 O_DIRECT so that the pages are copied into the dom0 page cache and
 transmitted from there.

The work around as same as we disable sg from NIC(disable it sendpage will
create own page copy rather than reuse the page).

Thanks,
Joe
 
 We were discussing recently the possibility of mapping all ballooned out
 pages to a single read-only scratch page instead of leaving them empty
 in the page tables, this would cause the Xen case to revert to the
 native case. I think Thanos was going to take a look into this.
 
 Ian.
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel panic in skb_copy_bits

2013-06-30 Thread Joe Jin
On 06/30/13 17:13, Alex Bligh wrote:
> 
> 
> --On 28 June 2013 12:17:43 +0800 Joe Jin  wrote:
> 
>> Find a similar issue
>> http://www.gossamer-threads.com/lists/xen/devel/265611 So copied to Xen
>> developer as well.
> 
> I thought this sounded familiar. I haven't got the start of this
> thread, but what version of Xen are you running and what device
> model? If before 4.3, there is a page lifetime bug in the kernel
> (not the xen code) which can affect anything where the guest accesses
> the host's block stack and that in turn accesses the networking
> stack (it may in fact be wider than that). So, e.g. domU on
> iCSSI will do it. It tends to get triggered by a TCP retransmit
> or (on NFS) the RPC equivalent. Essentially block operation
> is considered complete, returning through xen and freeing the
> grant table entry, and yet something in the kernel (e.g. tcp
> retransmit) can still access the data. The nature of the bug
> is extensively discussed in that thread - you'll also find
> a reference to a thread on linux-nfs which concludes it
> isn't an nfs problem, and even some patches to fix it in the
> kernel adding reference counting.

Do you know if have a fix for above? so far we also suspected the
grant page be unmapped earlier, we using 4.1 stable during our test.

> 
> A workaround is to turn off O_DIRECT use by Xen as that ensures
> the pages are copied. Xen 4.3 does this by default.
> 
> I believe fixes for this are in 4.3 and 4.2.2 if using the
> qemu upstream DM. Note these aren't real fixes, just a workaround
> of a kernel bug.

The guest is pvm, and disk model is xvbd, guest config file as below:

vif = ['mac=00:21:f6:00:00:01,bridge=c0a80b00']
OVM_simple_name = 'Guest#1'
disk = 
['file:/OVS/Repositories/0004fb0391e9eae94d1e907c/VirtualDisks/0004fb12f78799dad800ef47.img,xvda,w',
 'phy:/dev/mapper/360060e8010141870058b41570002,xvdb,w', 
'phy:/dev/mapper/360060e8010141870058b41570003,xvdc,w']
bootargs = ''
uuid = '0004fb00-0006--2b00-77a4766001ed'
on_reboot = 'restart'
cpu_weight = 27500
OVM_os_type = 'Oracle Linux 5'
cpu_cap = 0
maxvcpus = 8
OVM_high_availability = False
memory = 4096
OVM_description = ''
on_poweroff = 'destroy'
on_crash = 'restart'
bootloader = '/usr/bin/pygrub'
guest_os_type = 'linux'
name = '0004fb062b0077a4766001ed'
vfb = ['type=vnc,vncunused=1,vnclisten=127.0.0.1,keymap=en-us']
vcpus = 8
OVM_cpu_compat_group = ''
OVM_domain_type = 'xen_pvm'

> 
> To fix on a local build of xen you will need something like this:
> https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9
> and something like this (NB: obviously insert your own git
> repo and commit numbers)
> https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca
> 

I think this only for pvhvm/hvm?


Thanks,
Joe
> Also note those fixes are (technically) unsafe for live migration
> unless there is an ordering change made in qemu's block open
> call.
> 
> Of course this might be something completely different.
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel panic in skb_copy_bits

2013-06-30 Thread Joe Jin
On 06/30/13 17:13, Alex Bligh wrote:
 
 
 --On 28 June 2013 12:17:43 +0800 Joe Jin joe@oracle.com wrote:
 
 Find a similar issue
 http://www.gossamer-threads.com/lists/xen/devel/265611 So copied to Xen
 developer as well.
 
 I thought this sounded familiar. I haven't got the start of this
 thread, but what version of Xen are you running and what device
 model? If before 4.3, there is a page lifetime bug in the kernel
 (not the xen code) which can affect anything where the guest accesses
 the host's block stack and that in turn accesses the networking
 stack (it may in fact be wider than that). So, e.g. domU on
 iCSSI will do it. It tends to get triggered by a TCP retransmit
 or (on NFS) the RPC equivalent. Essentially block operation
 is considered complete, returning through xen and freeing the
 grant table entry, and yet something in the kernel (e.g. tcp
 retransmit) can still access the data. The nature of the bug
 is extensively discussed in that thread - you'll also find
 a reference to a thread on linux-nfs which concludes it
 isn't an nfs problem, and even some patches to fix it in the
 kernel adding reference counting.

Do you know if have a fix for above? so far we also suspected the
grant page be unmapped earlier, we using 4.1 stable during our test.

 
 A workaround is to turn off O_DIRECT use by Xen as that ensures
 the pages are copied. Xen 4.3 does this by default.
 
 I believe fixes for this are in 4.3 and 4.2.2 if using the
 qemu upstream DM. Note these aren't real fixes, just a workaround
 of a kernel bug.

The guest is pvm, and disk model is xvbd, guest config file as below:

vif = ['mac=00:21:f6:00:00:01,bridge=c0a80b00']
OVM_simple_name = 'Guest#1'
disk = 
['file:/OVS/Repositories/0004fb0391e9eae94d1e907c/VirtualDisks/0004fb12f78799dad800ef47.img,xvda,w',
 'phy:/dev/mapper/360060e8010141870058b41570002,xvdb,w', 
'phy:/dev/mapper/360060e8010141870058b41570003,xvdc,w']
bootargs = ''
uuid = '0004fb00-0006--2b00-77a4766001ed'
on_reboot = 'restart'
cpu_weight = 27500
OVM_os_type = 'Oracle Linux 5'
cpu_cap = 0
maxvcpus = 8
OVM_high_availability = False
memory = 4096
OVM_description = ''
on_poweroff = 'destroy'
on_crash = 'restart'
bootloader = '/usr/bin/pygrub'
guest_os_type = 'linux'
name = '0004fb062b0077a4766001ed'
vfb = ['type=vnc,vncunused=1,vnclisten=127.0.0.1,keymap=en-us']
vcpus = 8
OVM_cpu_compat_group = ''
OVM_domain_type = 'xen_pvm'

 
 To fix on a local build of xen you will need something like this:
 https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9
 and something like this (NB: obviously insert your own git
 repo and commit numbers)
 https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca
 

I think this only for pvhvm/hvm?


Thanks,
Joe
 Also note those fixes are (technically) unsafe for live migration
 unless there is an ordering change made in qemu's block open
 call.
 
 Of course this might be something completely different.
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel panic in skb_copy_bits

2013-06-29 Thread Joe Jin
On 06/29/13 15:20, Eric Dumazet wrote:
> On Sat, 2013-06-29 at 07:36 +0800, Joe Jin wrote:
>> Hi Eric,
>>
>> The patch not fix the issue and panic as same as early I posted:
>>> BUG: unable to handle kernel paging request at 88006d9e8d48
>>> IP: [] memcpy+0xb/0x120
>>> PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0
>>> Oops:  [#1] SMP 
>>> CPU 7 
>>> Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback 
>>> xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding 
>>> be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core 
>>> ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio 
>>> dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi 
>>> xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler 
>>> parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper 
>>> drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event 
>>> snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm 
>>> snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support 
>>> pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core 
>>> hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage 
>>> lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase 
>>> scsi_transport_sas sd_mod crc_t10dif ex!
>  t3!
>>   jbd mbcac
>> he
>>>
>>>
>>> Pid: 0, comm: swapper Tainted: GW   2.6.39-300.32.1.el5uek #1 Dell 
>>> Inc. PowerEdge 2950/0DP246
> 
> 
> By the way my patch was for current kernels, not for 2.6.39
> 
> For instance, I was not able to reproduce the crash with 3.3
> 
> RCU in neighbour code was added in 2.6.37, but it looks like this code
> is a bit fragile because all the kfree_skb() are done while neighbour
> locks are held.
> 
> So if a skb destructor triggers a new call to neighbour code, I presume
> some bad things can happen. LOCKDEP could eventually help to detect
> this.
> 
> You could try to replace these kfree_skb() calls to dev_kfree_skb_irq()
> just in case.
> 
> (Do not forget the __skb_queue_purge() ones)
> 
> Try a LOCKDEP build as well.

So far we suspected it caused by iscsi called sendpage(), and later page
be unmapped but still trying copy skb. We'll try to disable sg to see if
help or no.

Thanks,
Joe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel panic in skb_copy_bits

2013-06-29 Thread Joe Jin
On 06/29/13 15:20, Eric Dumazet wrote:
 On Sat, 2013-06-29 at 07:36 +0800, Joe Jin wrote:
 Hi Eric,

 The patch not fix the issue and panic as same as early I posted:
 BUG: unable to handle kernel paging request at 88006d9e8d48
 IP: [812605bb] memcpy+0xb/0x120
 PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0
 Oops:  [#1] SMP 
 CPU 7 
 Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback 
 xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding 
 be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core 
 ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio 
 dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi 
 xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler 
 parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper 
 drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event 
 snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm 
 snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support 
 pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core 
 hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage 
 lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase 
 scsi_transport_sas sd_mod crc_t10dif ex!
  t3!
   jbd mbcac
 he


 Pid: 0, comm: swapper Tainted: GW   2.6.39-300.32.1.el5uek #1 Dell 
 Inc. PowerEdge 2950/0DP246
 
 
 By the way my patch was for current kernels, not for 2.6.39
 
 For instance, I was not able to reproduce the crash with 3.3
 
 RCU in neighbour code was added in 2.6.37, but it looks like this code
 is a bit fragile because all the kfree_skb() are done while neighbour
 locks are held.
 
 So if a skb destructor triggers a new call to neighbour code, I presume
 some bad things can happen. LOCKDEP could eventually help to detect
 this.
 
 You could try to replace these kfree_skb() calls to dev_kfree_skb_irq()
 just in case.
 
 (Do not forget the __skb_queue_purge() ones)
 
 Try a LOCKDEP build as well.

So far we suspected it caused by iscsi called sendpage(), and later page
be unmapped but still trying copy skb. We'll try to disable sg to see if
help or no.

Thanks,
Joe
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel panic in skb_copy_bits

2013-06-28 Thread Joe Jin
Hi Eric,

The patch not fix the issue and panic as same as early I posted:
> BUG: unable to handle kernel paging request at 88006d9e8d48
> IP: [] memcpy+0xb/0x120
> PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0
> Oops:  [#1] SMP 
> CPU 7 
> Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback 
> xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding 
> be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core 
> ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio 
> dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs 
> xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler 
> parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper 
> drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event 
> snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer 
> snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi 
> dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed 
> dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc 
> scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase 
> scsi_transport_sas sd_mod crc_t10dif ext3!
  jbd mbcac
he
>
>
> Pid: 0, comm: swapper Tainted: GW   2.6.39-300.32.1.el5uek #1 Dell 
> Inc. PowerEdge 2950/0DP246
> RIP: e030:[]  [] memcpy+0xb/0x120
> RSP: e02b:8801003c3d58  EFLAGS: 00010246
> RAX: 880076b9e280 RBX: 8800714d2c00 RCX: 0057
> RDX:  RSI: 88006d9e8d48 RDI: 880076b9e280
> RBP: 8801003c3dc0 R08: 000bf723 R09: 
> R10:  R11: 000a R12: 0034
> R13: 0034 R14: 02b8 R15: 05a8
> FS:  7fc1e852a6e0() GS:8801003c() knlGS:
> CS:  e033 DS: 002b ES: 002b CR0: 8005003b
> CR2: 88006d9e8d48 CR3: 6370b000 CR4: 2660
> DR0:  DR1:  DR2: 
> DR3:  DR6: 0ff0 DR7: 0400
> Process swapper (pid: 0, threadinfo 880077ac, task 880077abe240)
> Stack:
>  8142db21  880076b9e280 8800637097f0
>  02ec 02b8 880077ac 
>  8800637097f0 880066c9a7c0 fdb4 024c
> Call Trace:
>   
>  [] ? skb_copy_bits+0x1c1/0x2e0
>  [] skb_copy+0xf3/0x120
>  [] neigh_timer_handler+0x1ac/0x350
>  [] ? account_idle_ticks+0xe/0x10
>  [] ? neigh_alloc+0x180/0x180
>  [] call_timer_fn+0x4a/0x110
>  [] ? neigh_alloc+0x180/0x180
>  [] run_timer_softirq+0x13a/0x220
>  [] __do_softirq+0xb9/0x1d0
>  [] ? handle_percpu_irq+0x48/0x70
>  [] call_softirq+0x1c/0x30
>  [] do_softirq+0x65/0xa0
>  [] irq_exit+0xab/0xc0
>  [] xen_evtchn_do_upcall+0x35/0x50
>  [] xen_do_hypervisor_callback+0x1e/0x30
>   
>  [] ? xen_hypercall_sched_op+0xa/0x20
>  [] ? xen_hypercall_sched_op+0xa/0x20
>  [] ? xen_safe_halt+0x10/0x20
>  [] ? default_idle+0x5b/0x170
>  [] ? cpu_idle+0xc6/0xf0
>  [] ? xen_irq_enable_direct_reloc+0x4/0x4
>  [] ? cpu_bringup_and_idle+0xe/0x10
> Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 
> 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07  48 a5 89 d1 f3 
> a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c 
> RIP  [] memcpy+0xb/0x120
>  RSP 
> CR2: 88006d9e8d48

Thanks,
Joe
On 06/28/13 17:37, Eric Dumazet wrote:
> OK please try the following patch
> 
> 
> [PATCH] neighbour: fix a race in neigh_destroy()
> 
> There is a race in neighbour code, because neigh_destroy() uses
> skb_queue_purge(>arp_queue) without holding neighbour lock,
> while other parts of the code assume neighbour rwlock is what
> protects arp_queue
> 
> Convert all skb_queue_purge() calls to the __skb_queue_purge() variant
> 
> Use __skb_queue_head_init() instead of skb_queue_head_init()
> to make clear we do not use arp_queue.lock
> 
> And hold neigh->lock in neigh_destroy() to close the race.
> 
> Reported-by: Joe Jin 
> Signed-off-by: Eric Dumazet 
> ---
>  net/core/neighbour.c |   12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> index 2569ab2..b7de821 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c
> @@ -231,7 +231,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, 
> struct net_device *dev)
>  we must kill timers etc. and move
>  it to safe state.
>*/
>

Re: kernel panic in skb_copy_bits

2013-06-28 Thread Joe Jin
Hi Eric,

Thanks for your patch, I'll test it then get back to you.

Regards,
Joe
On 06/28/13 17:37, Eric Dumazet wrote:
> OK please try the following patch
> 
> 
> [PATCH] neighbour: fix a race in neigh_destroy()
> 
> There is a race in neighbour code, because neigh_destroy() uses
> skb_queue_purge(>arp_queue) without holding neighbour lock,
> while other parts of the code assume neighbour rwlock is what
> protects arp_queue
> 
> Convert all skb_queue_purge() calls to the __skb_queue_purge() variant
> 
> Use __skb_queue_head_init() instead of skb_queue_head_init()
> to make clear we do not use arp_queue.lock
> 
> And hold neigh->lock in neigh_destroy() to close the race.
> 
> Reported-by: Joe Jin 
> Signed-off-by: Eric Dumazet 
> ---
>  net/core/neighbour.c |   12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> index 2569ab2..b7de821 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c
> @@ -231,7 +231,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, 
> struct net_device *dev)
>  we must kill timers etc. and move
>  it to safe state.
>*/
> - skb_queue_purge(>arp_queue);
> + __skb_queue_purge(>arp_queue);
>   n->arp_queue_len_bytes = 0;
>   n->output = neigh_blackhole;
>   if (n->nud_state & NUD_VALID)
> @@ -286,7 +286,7 @@ static struct neighbour *neigh_alloc(struct neigh_table 
> *tbl, struct net_device
>   if (!n)
>   goto out_entries;
>  
> - skb_queue_head_init(>arp_queue);
> + __skb_queue_head_init(>arp_queue);
>   rwlock_init(>lock);
>   seqlock_init(>ha_lock);
>   n->updated= n->used = now;
> @@ -708,7 +708,9 @@ void neigh_destroy(struct neighbour *neigh)
>   if (neigh_del_timer(neigh))
>   pr_warn("Impossible event\n");
>  
> - skb_queue_purge(>arp_queue);
> + write_lock_bh(>lock);
> + __skb_queue_purge(>arp_queue);
> + write_unlock_bh(>lock);
>   neigh->arp_queue_len_bytes = 0;
>  
>   if (dev->netdev_ops->ndo_neigh_destroy)
> @@ -858,7 +860,7 @@ static void neigh_invalidate(struct neighbour *neigh)
>   neigh->ops->error_report(neigh, skb);
>   write_lock(>lock);
>   }
> - skb_queue_purge(>arp_queue);
> + __skb_queue_purge(>arp_queue);
>   neigh->arp_queue_len_bytes = 0;
>  }
>  
> @@ -1210,7 +1212,7 @@ int neigh_update(struct neighbour *neigh, const u8 
> *lladdr, u8 new,
>  
>   write_lock_bh(>lock);
>   }
> - skb_queue_purge(>arp_queue);
> + __skb_queue_purge(>arp_queue);
>   neigh->arp_queue_len_bytes = 0;
>   }
>  out:
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel panic in skb_copy_bits

2013-06-28 Thread Joe Jin
Hi Eric,

Thanks for your patch, I'll test it then get back to you.

Regards,
Joe
On 06/28/13 17:37, Eric Dumazet wrote:
 OK please try the following patch
 
 
 [PATCH] neighbour: fix a race in neigh_destroy()
 
 There is a race in neighbour code, because neigh_destroy() uses
 skb_queue_purge(neigh-arp_queue) without holding neighbour lock,
 while other parts of the code assume neighbour rwlock is what
 protects arp_queue
 
 Convert all skb_queue_purge() calls to the __skb_queue_purge() variant
 
 Use __skb_queue_head_init() instead of skb_queue_head_init()
 to make clear we do not use arp_queue.lock
 
 And hold neigh-lock in neigh_destroy() to close the race.
 
 Reported-by: Joe Jin joe@oracle.com
 Signed-off-by: Eric Dumazet eduma...@google.com
 ---
  net/core/neighbour.c |   12 +++-
  1 file changed, 7 insertions(+), 5 deletions(-)
 
 diff --git a/net/core/neighbour.c b/net/core/neighbour.c
 index 2569ab2..b7de821 100644
 --- a/net/core/neighbour.c
 +++ b/net/core/neighbour.c
 @@ -231,7 +231,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, 
 struct net_device *dev)
  we must kill timers etc. and move
  it to safe state.
*/
 - skb_queue_purge(n-arp_queue);
 + __skb_queue_purge(n-arp_queue);
   n-arp_queue_len_bytes = 0;
   n-output = neigh_blackhole;
   if (n-nud_state  NUD_VALID)
 @@ -286,7 +286,7 @@ static struct neighbour *neigh_alloc(struct neigh_table 
 *tbl, struct net_device
   if (!n)
   goto out_entries;
  
 - skb_queue_head_init(n-arp_queue);
 + __skb_queue_head_init(n-arp_queue);
   rwlock_init(n-lock);
   seqlock_init(n-ha_lock);
   n-updated= n-used = now;
 @@ -708,7 +708,9 @@ void neigh_destroy(struct neighbour *neigh)
   if (neigh_del_timer(neigh))
   pr_warn(Impossible event\n);
  
 - skb_queue_purge(neigh-arp_queue);
 + write_lock_bh(neigh-lock);
 + __skb_queue_purge(neigh-arp_queue);
 + write_unlock_bh(neigh-lock);
   neigh-arp_queue_len_bytes = 0;
  
   if (dev-netdev_ops-ndo_neigh_destroy)
 @@ -858,7 +860,7 @@ static void neigh_invalidate(struct neighbour *neigh)
   neigh-ops-error_report(neigh, skb);
   write_lock(neigh-lock);
   }
 - skb_queue_purge(neigh-arp_queue);
 + __skb_queue_purge(neigh-arp_queue);
   neigh-arp_queue_len_bytes = 0;
  }
  
 @@ -1210,7 +1212,7 @@ int neigh_update(struct neighbour *neigh, const u8 
 *lladdr, u8 new,
  
   write_lock_bh(neigh-lock);
   }
 - skb_queue_purge(neigh-arp_queue);
 + __skb_queue_purge(neigh-arp_queue);
   neigh-arp_queue_len_bytes = 0;
   }
  out:
 
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel panic in skb_copy_bits

2013-06-28 Thread Joe Jin
Hi Eric,

The patch not fix the issue and panic as same as early I posted:
 BUG: unable to handle kernel paging request at 88006d9e8d48
 IP: [812605bb] memcpy+0xb/0x120
 PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0
 Oops:  [#1] SMP 
 CPU 7 
 Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback 
 xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding 
 be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core 
 ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio 
 dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs 
 xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler 
 parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper 
 drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event 
 snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer 
 snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi 
 dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed 
 dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc 
 scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase 
 scsi_transport_sas sd_mod crc_t10dif ext3!
  jbd mbcac
he


 Pid: 0, comm: swapper Tainted: GW   2.6.39-300.32.1.el5uek #1 Dell 
 Inc. PowerEdge 2950/0DP246
 RIP: e030:[812605bb]  [812605bb] memcpy+0xb/0x120
 RSP: e02b:8801003c3d58  EFLAGS: 00010246
 RAX: 880076b9e280 RBX: 8800714d2c00 RCX: 0057
 RDX:  RSI: 88006d9e8d48 RDI: 880076b9e280
 RBP: 8801003c3dc0 R08: 000bf723 R09: 
 R10:  R11: 000a R12: 0034
 R13: 0034 R14: 02b8 R15: 05a8
 FS:  7fc1e852a6e0() GS:8801003c() knlGS:
 CS:  e033 DS: 002b ES: 002b CR0: 8005003b
 CR2: 88006d9e8d48 CR3: 6370b000 CR4: 2660
 DR0:  DR1:  DR2: 
 DR3:  DR6: 0ff0 DR7: 0400
 Process swapper (pid: 0, threadinfo 880077ac, task 880077abe240)
 Stack:
  8142db21  880076b9e280 8800637097f0
  02ec 02b8 880077ac 
  8800637097f0 880066c9a7c0 fdb4 024c
 Call Trace:
  IRQ 
  [8142db21] ? skb_copy_bits+0x1c1/0x2e0
  [8142f173] skb_copy+0xf3/0x120
  [81447fbc] neigh_timer_handler+0x1ac/0x350
  [810573fe] ? account_idle_ticks+0xe/0x10
  [81447e10] ? neigh_alloc+0x180/0x180
  [8107dbaa] call_timer_fn+0x4a/0x110
  [81447e10] ? neigh_alloc+0x180/0x180
  [8107f82a] run_timer_softirq+0x13a/0x220
  [81075c39] __do_softirq+0xb9/0x1d0
  [810d9678] ? handle_percpu_irq+0x48/0x70
  [81511d3c] call_softirq+0x1c/0x30
  [810172e5] do_softirq+0x65/0xa0
  [8107656b] irq_exit+0xab/0xc0
  [812f97d5] xen_evtchn_do_upcall+0x35/0x50
  [81511d8e] xen_do_hypervisor_callback+0x1e/0x30
  EOI 
  [810013aa] ? xen_hypercall_sched_op+0xa/0x20
  [810013aa] ? xen_hypercall_sched_op+0xa/0x20
  [8100a0b0] ? xen_safe_halt+0x10/0x20
  [8101dfeb] ? default_idle+0x5b/0x170
  [81014ac6] ? cpu_idle+0xc6/0xf0
  [8100a8c9] ? xen_irq_enable_direct_reloc+0x4/0x4
  [814f7bbe] ? cpu_bringup_and_idle+0xe/0x10
 Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 
 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 
 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c 
 RIP  [812605bb] memcpy+0xb/0x120
  RSP 8801003c3d58
 CR2: 88006d9e8d48

Thanks,
Joe
On 06/28/13 17:37, Eric Dumazet wrote:
 OK please try the following patch
 
 
 [PATCH] neighbour: fix a race in neigh_destroy()
 
 There is a race in neighbour code, because neigh_destroy() uses
 skb_queue_purge(neigh-arp_queue) without holding neighbour lock,
 while other parts of the code assume neighbour rwlock is what
 protects arp_queue
 
 Convert all skb_queue_purge() calls to the __skb_queue_purge() variant
 
 Use __skb_queue_head_init() instead of skb_queue_head_init()
 to make clear we do not use arp_queue.lock
 
 And hold neigh-lock in neigh_destroy() to close the race.
 
 Reported-by: Joe Jin joe@oracle.com
 Signed-off-by: Eric Dumazet eduma...@google.com
 ---
  net/core/neighbour.c |   12 +++-
  1 file changed, 7 insertions(+), 5 deletions(-)
 
 diff --git a/net/core/neighbour.c b/net/core/neighbour.c
 index 2569ab2..b7de821 100644
 --- a/net/core/neighbour.c
 +++ b/net/core/neighbour.c
 @@ -231,7 +231,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, 
 struct net_device *dev)
  we must kill timers etc. and move

Re: kernel panic in skb_copy_bits

2013-06-27 Thread Joe Jin
Find a similar issue  http://www.gossamer-threads.com/lists/xen/devel/265611
So copied to Xen developer as well.

On 06/27/13 13:31, Eric Dumazet wrote:
> On Thu, 2013-06-27 at 10:58 +0800, Joe Jin wrote:
>> Hi,
>>
>> When we do fail over test with iscsi + multipath by reset the switches
>> on OVM(2.6.39) we hit the panic:
>>
>> BUG: unable to handle kernel paging request at 88006d9e8d48
>> IP: [] memcpy+0xb/0x120
>> PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0
>> Oops:  [#1] SMP 
>> CPU 7 
>> Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback 
>> xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding 
>> be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core 
>> ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio 
>> dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs 
>> xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler 
>> parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper 
>> drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event 
>> snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer 
>> snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi 
>> dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed 
>> dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc 
>> scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase 
>> scsi_transport_sas sd_mod crc_t10dif ext!
 3!
>   j!
>>  bd mbcache
>>
>>
>> Pid: 0, comm: swapper Tainted: GW   2.6.39-300.32.1.el5uek #1 Dell 
>> Inc. PowerEdge 2950/0DP246
>> RIP: e030:[]  [] memcpy+0xb/0x120
>> RSP: e02b:8801003c3d58  EFLAGS: 00010246
>> RAX: 880076b9e280 RBX: 8800714d2c00 RCX: 0057
>> RDX:  RSI: 88006d9e8d48 RDI: 880076b9e280
>> RBP: 8801003c3dc0 R08: 000bf723 R09: 
>> R10:  R11: 000a R12: 0034
>> R13: 0034 R14: 02b8 R15: 05a8
>> FS:  7fc1e852a6e0() GS:8801003c() knlGS:
>> CS:  e033 DS: 002b ES: 002b CR0: 8005003b
>> CR2: 88006d9e8d48 CR3: 6370b000 CR4: 2660
>> DR0:  DR1:  DR2: 
>> DR3:  DR6: 0ff0 DR7: 0400
>> Process swapper (pid: 0, threadinfo 880077ac, task 880077abe240)
>> Stack:
>>  8142db21  880076b9e280 8800637097f0
>>  02ec 02b8 880077ac 
>>  8800637097f0 880066c9a7c0 fdb4 024c
>> Call Trace:
>>   
>>  [] ? skb_copy_bits+0x1c1/0x2e0
>>  [] skb_copy+0xf3/0x120
>>  [] neigh_timer_handler+0x1ac/0x350
>>  [] ? account_idle_ticks+0xe/0x10
>>  [] ? neigh_alloc+0x180/0x180
>>  [] call_timer_fn+0x4a/0x110
>>  [] ? neigh_alloc+0x180/0x180
>>  [] run_timer_softirq+0x13a/0x220
>>  [] __do_softirq+0xb9/0x1d0
>>  [] ? handle_percpu_irq+0x48/0x70
>>  [] call_softirq+0x1c/0x30
>>  [] do_softirq+0x65/0xa0
>>  [] irq_exit+0xab/0xc0
>>  [] xen_evtchn_do_upcall+0x35/0x50
>>  [] xen_do_hypervisor_callback+0x1e/0x30
>>   
>>  [] ? xen_hypercall_sched_op+0xa/0x20
>>  [] ? xen_hypercall_sched_op+0xa/0x20
>>  [] ? xen_safe_halt+0x10/0x20
>>  [] ? default_idle+0x5b/0x170
>>  [] ? cpu_idle+0xc6/0xf0
>>  [] ? xen_irq_enable_direct_reloc+0x4/0x4
>>  [] ? cpu_bringup_and_idle+0xe/0x10
>> Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 
>> 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07  48 a5 89 d1 
>> f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c 
>> RIP  [] memcpy+0xb/0x120
>>  RSP 
>> CR2: 88006d9e8d48
>>
>> Reviewed vmcore I found the skb->users is 1 at the moment, checked network 
>> neighbour 
>> history I found skb_get() be replaced by skb_copy by commit 7e36763b2c:
>>
>> commit 7e36763b2c204d59de4e88087f84a2c0c8421f25  
>>  
>> Author: Frank Blaschka
>>  
>> Date:   Mon Mar 3 12:16:04 2008 -0800
>>
>> [NET]: Fix race in generic address resolution.   
>>  
>> 
>> neigh_update sends skb from neigh->arp_queue while neigh_timer_handler   
>>  
>> has increased skbs refcount and calls so

Re: kernel panic in skb_copy_bits

2013-06-27 Thread Joe Jin
Hi Eric,

Thanks for you response, will test it and get back to you.

Regards,
Joe
On 06/27/13 13:31, Eric Dumazet wrote:
> On Thu, 2013-06-27 at 10:58 +0800, Joe Jin wrote:
>> Hi,
>>
>> When we do fail over test with iscsi + multipath by reset the switches
>> on OVM(2.6.39) we hit the panic:
>>
>> BUG: unable to handle kernel paging request at 88006d9e8d48
>> IP: [] memcpy+0xb/0x120
>> PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0
>> Oops:  [#1] SMP 
>> CPU 7 
>> Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback 
>> xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding 
>> be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core 
>> ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio 
>> dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs 
>> xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler 
>> parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper 
>> drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event 
>> snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer 
>> snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi 
>> dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed 
>> dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc 
>> scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase 
>> scsi_transport_sas sd_mod crc_t10dif ext!
 3!
>   j!
>>  bd mbcache
>>
>>
>> Pid: 0, comm: swapper Tainted: GW   2.6.39-300.32.1.el5uek #1 Dell 
>> Inc. PowerEdge 2950/0DP246
>> RIP: e030:[]  [] memcpy+0xb/0x120
>> RSP: e02b:8801003c3d58  EFLAGS: 00010246
>> RAX: 880076b9e280 RBX: 8800714d2c00 RCX: 0057
>> RDX:  RSI: 88006d9e8d48 RDI: 880076b9e280
>> RBP: 8801003c3dc0 R08: 000bf723 R09: 
>> R10:  R11: 000a R12: 0034
>> R13: 0034 R14: 02b8 R15: 05a8
>> FS:  7fc1e852a6e0() GS:8801003c() knlGS:
>> CS:  e033 DS: 002b ES: 002b CR0: 8005003b
>> CR2: 88006d9e8d48 CR3: 6370b000 CR4: 2660
>> DR0:  DR1:  DR2: 
>> DR3:  DR6: 0ff0 DR7: 0400
>> Process swapper (pid: 0, threadinfo 880077ac, task 880077abe240)
>> Stack:
>>  8142db21  880076b9e280 8800637097f0
>>  02ec 02b8 880077ac 
>>  8800637097f0 880066c9a7c0 fdb4 024c
>> Call Trace:
>>   
>>  [] ? skb_copy_bits+0x1c1/0x2e0
>>  [] skb_copy+0xf3/0x120
>>  [] neigh_timer_handler+0x1ac/0x350
>>  [] ? account_idle_ticks+0xe/0x10
>>  [] ? neigh_alloc+0x180/0x180
>>  [] call_timer_fn+0x4a/0x110
>>  [] ? neigh_alloc+0x180/0x180
>>  [] run_timer_softirq+0x13a/0x220
>>  [] __do_softirq+0xb9/0x1d0
>>  [] ? handle_percpu_irq+0x48/0x70
>>  [] call_softirq+0x1c/0x30
>>  [] do_softirq+0x65/0xa0
>>  [] irq_exit+0xab/0xc0
>>  [] xen_evtchn_do_upcall+0x35/0x50
>>  [] xen_do_hypervisor_callback+0x1e/0x30
>>   
>>  [] ? xen_hypercall_sched_op+0xa/0x20
>>  [] ? xen_hypercall_sched_op+0xa/0x20
>>  [] ? xen_safe_halt+0x10/0x20
>>  [] ? default_idle+0x5b/0x170
>>  [] ? cpu_idle+0xc6/0xf0
>>  [] ? xen_irq_enable_direct_reloc+0x4/0x4
>>  [] ? cpu_bringup_and_idle+0xe/0x10
>> Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 
>> 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07  48 a5 89 d1 
>> f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c 
>> RIP  [] memcpy+0xb/0x120
>>  RSP 
>> CR2: 88006d9e8d48
>>
>> Reviewed vmcore I found the skb->users is 1 at the moment, checked network 
>> neighbour 
>> history I found skb_get() be replaced by skb_copy by commit 7e36763b2c:
>>
>> commit 7e36763b2c204d59de4e88087f84a2c0c8421f25  
>>  
>> Author: Frank Blaschka
>>  
>> Date:   Mon Mar 3 12:16:04 2008 -0800
>>
>> [NET]: Fix race in generic address resolution.   
>>  
>> 
>> neigh_update sends skb from neigh->arp_queue while neigh_timer_handler   
>>  
>> has increased skbs refcount and calls solicit with the   

Re: kernel panic in skb_copy_bits

2013-06-27 Thread Joe Jin
Hi Eric,

Thanks for you response, will test it and get back to you.

Regards,
Joe
On 06/27/13 13:31, Eric Dumazet wrote:
 On Thu, 2013-06-27 at 10:58 +0800, Joe Jin wrote:
 Hi,

 When we do fail over test with iscsi + multipath by reset the switches
 on OVM(2.6.39) we hit the panic:

 BUG: unable to handle kernel paging request at 88006d9e8d48
 IP: [812605bb] memcpy+0xb/0x120
 PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0
 Oops:  [#1] SMP 
 CPU 7 
 Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback 
 xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding 
 be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core 
 ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio 
 dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs 
 xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler 
 parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper 
 drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event 
 snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer 
 snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi 
 dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed 
 dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc 
 scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase 
 scsi_transport_sas sd_mod crc_t10dif ext!
 3!
   j!
  bd mbcache


 Pid: 0, comm: swapper Tainted: GW   2.6.39-300.32.1.el5uek #1 Dell 
 Inc. PowerEdge 2950/0DP246
 RIP: e030:[812605bb]  [812605bb] memcpy+0xb/0x120
 RSP: e02b:8801003c3d58  EFLAGS: 00010246
 RAX: 880076b9e280 RBX: 8800714d2c00 RCX: 0057
 RDX:  RSI: 88006d9e8d48 RDI: 880076b9e280
 RBP: 8801003c3dc0 R08: 000bf723 R09: 
 R10:  R11: 000a R12: 0034
 R13: 0034 R14: 02b8 R15: 05a8
 FS:  7fc1e852a6e0() GS:8801003c() knlGS:
 CS:  e033 DS: 002b ES: 002b CR0: 8005003b
 CR2: 88006d9e8d48 CR3: 6370b000 CR4: 2660
 DR0:  DR1:  DR2: 
 DR3:  DR6: 0ff0 DR7: 0400
 Process swapper (pid: 0, threadinfo 880077ac, task 880077abe240)
 Stack:
  8142db21  880076b9e280 8800637097f0
  02ec 02b8 880077ac 
  8800637097f0 880066c9a7c0 fdb4 024c
 Call Trace:
  IRQ 
  [8142db21] ? skb_copy_bits+0x1c1/0x2e0
  [8142f173] skb_copy+0xf3/0x120
  [81447fbc] neigh_timer_handler+0x1ac/0x350
  [810573fe] ? account_idle_ticks+0xe/0x10
  [81447e10] ? neigh_alloc+0x180/0x180
  [8107dbaa] call_timer_fn+0x4a/0x110
  [81447e10] ? neigh_alloc+0x180/0x180
  [8107f82a] run_timer_softirq+0x13a/0x220
  [81075c39] __do_softirq+0xb9/0x1d0
  [810d9678] ? handle_percpu_irq+0x48/0x70
  [81511d3c] call_softirq+0x1c/0x30
  [810172e5] do_softirq+0x65/0xa0
  [8107656b] irq_exit+0xab/0xc0
  [812f97d5] xen_evtchn_do_upcall+0x35/0x50
  [81511d8e] xen_do_hypervisor_callback+0x1e/0x30
  EOI 
  [810013aa] ? xen_hypercall_sched_op+0xa/0x20
  [810013aa] ? xen_hypercall_sched_op+0xa/0x20
  [8100a0b0] ? xen_safe_halt+0x10/0x20
  [8101dfeb] ? default_idle+0x5b/0x170
  [81014ac6] ? cpu_idle+0xc6/0xf0
  [8100a8c9] ? xen_irq_enable_direct_reloc+0x4/0x4
  [814f7bbe] ? cpu_bringup_and_idle+0xe/0x10
 Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 
 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 f3 48 a5 89 d1 
 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c 
 RIP  [812605bb] memcpy+0xb/0x120
  RSP 8801003c3d58
 CR2: 88006d9e8d48

 Reviewed vmcore I found the skb-users is 1 at the moment, checked network 
 neighbour 
 history I found skb_get() be replaced by skb_copy by commit 7e36763b2c:

 commit 7e36763b2c204d59de4e88087f84a2c0c8421f25  
  
 Author: Frank Blaschka frank.blasc...@de.ibm.com   
  
 Date:   Mon Mar 3 12:16:04 2008 -0800

 [NET]: Fix race in generic address resolution.   
  
 
 neigh_update sends skb from neigh-arp_queue while neigh_timer_handler   
  
 has increased skbs refcount and calls solicit with the   
  
 skb. neigh_timer_handler should not increase skbs refcount but make a
  
 copy of the skb and do solicit with the copy.
  
 
 Signed-off-by: Frank Blaschka frank.blasc...@de.ibm.com
  
 Signed-off-by: David S. Miller da

Re: kernel panic in skb_copy_bits

2013-06-27 Thread Joe Jin
Find a similar issue  http://www.gossamer-threads.com/lists/xen/devel/265611
So copied to Xen developer as well.

On 06/27/13 13:31, Eric Dumazet wrote:
 On Thu, 2013-06-27 at 10:58 +0800, Joe Jin wrote:
 Hi,

 When we do fail over test with iscsi + multipath by reset the switches
 on OVM(2.6.39) we hit the panic:

 BUG: unable to handle kernel paging request at 88006d9e8d48
 IP: [812605bb] memcpy+0xb/0x120
 PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0
 Oops:  [#1] SMP 
 CPU 7 
 Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback 
 xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding 
 be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core 
 ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio 
 dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs 
 xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler 
 parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper 
 drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event 
 snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer 
 snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi 
 dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed 
 dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc 
 scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase 
 scsi_transport_sas sd_mod crc_t10dif ext!
 3!
   j!
  bd mbcache


 Pid: 0, comm: swapper Tainted: GW   2.6.39-300.32.1.el5uek #1 Dell 
 Inc. PowerEdge 2950/0DP246
 RIP: e030:[812605bb]  [812605bb] memcpy+0xb/0x120
 RSP: e02b:8801003c3d58  EFLAGS: 00010246
 RAX: 880076b9e280 RBX: 8800714d2c00 RCX: 0057
 RDX:  RSI: 88006d9e8d48 RDI: 880076b9e280
 RBP: 8801003c3dc0 R08: 000bf723 R09: 
 R10:  R11: 000a R12: 0034
 R13: 0034 R14: 02b8 R15: 05a8
 FS:  7fc1e852a6e0() GS:8801003c() knlGS:
 CS:  e033 DS: 002b ES: 002b CR0: 8005003b
 CR2: 88006d9e8d48 CR3: 6370b000 CR4: 2660
 DR0:  DR1:  DR2: 
 DR3:  DR6: 0ff0 DR7: 0400
 Process swapper (pid: 0, threadinfo 880077ac, task 880077abe240)
 Stack:
  8142db21  880076b9e280 8800637097f0
  02ec 02b8 880077ac 
  8800637097f0 880066c9a7c0 fdb4 024c
 Call Trace:
  IRQ 
  [8142db21] ? skb_copy_bits+0x1c1/0x2e0
  [8142f173] skb_copy+0xf3/0x120
  [81447fbc] neigh_timer_handler+0x1ac/0x350
  [810573fe] ? account_idle_ticks+0xe/0x10
  [81447e10] ? neigh_alloc+0x180/0x180
  [8107dbaa] call_timer_fn+0x4a/0x110
  [81447e10] ? neigh_alloc+0x180/0x180
  [8107f82a] run_timer_softirq+0x13a/0x220
  [81075c39] __do_softirq+0xb9/0x1d0
  [810d9678] ? handle_percpu_irq+0x48/0x70
  [81511d3c] call_softirq+0x1c/0x30
  [810172e5] do_softirq+0x65/0xa0
  [8107656b] irq_exit+0xab/0xc0
  [812f97d5] xen_evtchn_do_upcall+0x35/0x50
  [81511d8e] xen_do_hypervisor_callback+0x1e/0x30
  EOI 
  [810013aa] ? xen_hypercall_sched_op+0xa/0x20
  [810013aa] ? xen_hypercall_sched_op+0xa/0x20
  [8100a0b0] ? xen_safe_halt+0x10/0x20
  [8101dfeb] ? default_idle+0x5b/0x170
  [81014ac6] ? cpu_idle+0xc6/0xf0
  [8100a8c9] ? xen_irq_enable_direct_reloc+0x4/0x4
  [814f7bbe] ? cpu_bringup_and_idle+0xe/0x10
 Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 
 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 f3 48 a5 89 d1 
 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c 
 RIP  [812605bb] memcpy+0xb/0x120
  RSP 8801003c3d58
 CR2: 88006d9e8d48

 Reviewed vmcore I found the skb-users is 1 at the moment, checked network 
 neighbour 
 history I found skb_get() be replaced by skb_copy by commit 7e36763b2c:

 commit 7e36763b2c204d59de4e88087f84a2c0c8421f25  
  
 Author: Frank Blaschka frank.blasc...@de.ibm.com   
  
 Date:   Mon Mar 3 12:16:04 2008 -0800

 [NET]: Fix race in generic address resolution.   
  
 
 neigh_update sends skb from neigh-arp_queue while neigh_timer_handler   
  
 has increased skbs refcount and calls solicit with the   
  
 skb. neigh_timer_handler should not increase skbs refcount but make a
  
 copy of the skb and do solicit with the copy.
  
 
 Signed-off-by: Frank Blaschka frank.blasc...@de.ibm.com
  
 Signed-off

kernel panic in skb_copy_bits

2013-06-26 Thread Joe Jin
Hi,

When we do fail over test with iscsi + multipath by reset the switches
on OVM(2.6.39) we hit the panic:

BUG: unable to handle kernel paging request at 88006d9e8d48
IP: [] memcpy+0xb/0x120
PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0
Oops:  [#1] SMP 
CPU 7 
Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback 
xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi 
iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr 
iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin 
dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video 
sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe 
dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit 
i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss 
snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt 
pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy 
ghes i5000_edac edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash 
dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp 
mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ext3 j!
 bd mbcache


Pid: 0, comm: swapper Tainted: GW   2.6.39-300.32.1.el5uek #1 Dell Inc. 
PowerEdge 2950/0DP246
RIP: e030:[]  [] memcpy+0xb/0x120
RSP: e02b:8801003c3d58  EFLAGS: 00010246
RAX: 880076b9e280 RBX: 8800714d2c00 RCX: 0057
RDX:  RSI: 88006d9e8d48 RDI: 880076b9e280
RBP: 8801003c3dc0 R08: 000bf723 R09: 
R10:  R11: 000a R12: 0034
R13: 0034 R14: 02b8 R15: 05a8
FS:  7fc1e852a6e0() GS:8801003c() knlGS:
CS:  e033 DS: 002b ES: 002b CR0: 8005003b
CR2: 88006d9e8d48 CR3: 6370b000 CR4: 2660
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process swapper (pid: 0, threadinfo 880077ac, task 880077abe240)
Stack:
 8142db21  880076b9e280 8800637097f0
 02ec 02b8 880077ac 
 8800637097f0 880066c9a7c0 fdb4 024c
Call Trace:
  
 [] ? skb_copy_bits+0x1c1/0x2e0
 [] skb_copy+0xf3/0x120
 [] neigh_timer_handler+0x1ac/0x350
 [] ? account_idle_ticks+0xe/0x10
 [] ? neigh_alloc+0x180/0x180
 [] call_timer_fn+0x4a/0x110
 [] ? neigh_alloc+0x180/0x180
 [] run_timer_softirq+0x13a/0x220
 [] __do_softirq+0xb9/0x1d0
 [] ? handle_percpu_irq+0x48/0x70
 [] call_softirq+0x1c/0x30
 [] do_softirq+0x65/0xa0
 [] irq_exit+0xab/0xc0
 [] xen_evtchn_do_upcall+0x35/0x50
 [] xen_do_hypervisor_callback+0x1e/0x30
  
 [] ? xen_hypercall_sched_op+0xa/0x20
 [] ? xen_hypercall_sched_op+0xa/0x20
 [] ? xen_safe_halt+0x10/0x20
 [] ? default_idle+0x5b/0x170
 [] ? cpu_idle+0xc6/0xf0
 [] ? xen_irq_enable_direct_reloc+0x4/0x4
 [] ? cpu_bringup_and_idle+0xe/0x10
Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 
48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07  48 a5 89 d1 f3 a4 
c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c 
RIP  [] memcpy+0xb/0x120
 RSP 
CR2: 88006d9e8d48

Reviewed vmcore I found the skb->users is 1 at the moment, checked network 
neighbour 
history I found skb_get() be replaced by skb_copy by commit 7e36763b2c:

commit 7e36763b2c204d59de4e88087f84a2c0c8421f25   
Author: Frank Blaschka 
Date:   Mon Mar 3 12:16:04 2008 -0800

[NET]: Fix race in generic address resolution.

neigh_update sends skb from neigh->arp_queue while neigh_timer_handler
has increased skbs refcount and calls solicit with the
skb. neigh_timer_handler should not increase skbs refcount but make a 
copy of the skb and do solicit with the copy. 

Signed-off-by: Frank Blaschka  
Signed-off-by: David S. Miller   

So can you please give some details of the race? per vmcore seems like the skb 
data
be freed, I suspected skb_get() lost at somewhere?
I reverted above commit the panic not occurred during our testing.

Any input will appreciate!

Best Regards,
Joe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


kernel panic in skb_copy_bits

2013-06-26 Thread Joe Jin
Hi,

When we do fail over test with iscsi + multipath by reset the switches
on OVM(2.6.39) we hit the panic:

BUG: unable to handle kernel paging request at 88006d9e8d48
IP: [812605bb] memcpy+0xb/0x120
PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0
Oops:  [#1] SMP 
CPU 7 
Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback 
xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi 
iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr 
iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin 
dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video 
sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe 
dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit 
i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss 
snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt 
pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy 
ghes i5000_edac edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash 
dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp 
mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ext3 j!
 bd mbcache


Pid: 0, comm: swapper Tainted: GW   2.6.39-300.32.1.el5uek #1 Dell Inc. 
PowerEdge 2950/0DP246
RIP: e030:[812605bb]  [812605bb] memcpy+0xb/0x120
RSP: e02b:8801003c3d58  EFLAGS: 00010246
RAX: 880076b9e280 RBX: 8800714d2c00 RCX: 0057
RDX:  RSI: 88006d9e8d48 RDI: 880076b9e280
RBP: 8801003c3dc0 R08: 000bf723 R09: 
R10:  R11: 000a R12: 0034
R13: 0034 R14: 02b8 R15: 05a8
FS:  7fc1e852a6e0() GS:8801003c() knlGS:
CS:  e033 DS: 002b ES: 002b CR0: 8005003b
CR2: 88006d9e8d48 CR3: 6370b000 CR4: 2660
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process swapper (pid: 0, threadinfo 880077ac, task 880077abe240)
Stack:
 8142db21  880076b9e280 8800637097f0
 02ec 02b8 880077ac 
 8800637097f0 880066c9a7c0 fdb4 024c
Call Trace:
 IRQ 
 [8142db21] ? skb_copy_bits+0x1c1/0x2e0
 [8142f173] skb_copy+0xf3/0x120
 [81447fbc] neigh_timer_handler+0x1ac/0x350
 [810573fe] ? account_idle_ticks+0xe/0x10
 [81447e10] ? neigh_alloc+0x180/0x180
 [8107dbaa] call_timer_fn+0x4a/0x110
 [81447e10] ? neigh_alloc+0x180/0x180
 [8107f82a] run_timer_softirq+0x13a/0x220
 [81075c39] __do_softirq+0xb9/0x1d0
 [810d9678] ? handle_percpu_irq+0x48/0x70
 [81511d3c] call_softirq+0x1c/0x30
 [810172e5] do_softirq+0x65/0xa0
 [8107656b] irq_exit+0xab/0xc0
 [812f97d5] xen_evtchn_do_upcall+0x35/0x50
 [81511d8e] xen_do_hypervisor_callback+0x1e/0x30
 EOI 
 [810013aa] ? xen_hypercall_sched_op+0xa/0x20
 [810013aa] ? xen_hypercall_sched_op+0xa/0x20
 [8100a0b0] ? xen_safe_halt+0x10/0x20
 [8101dfeb] ? default_idle+0x5b/0x170
 [81014ac6] ? cpu_idle+0xc6/0xf0
 [8100a8c9] ? xen_irq_enable_direct_reloc+0x4/0x4
 [814f7bbe] ? cpu_bringup_and_idle+0xe/0x10
Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 
48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 
c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c 
RIP  [812605bb] memcpy+0xb/0x120
 RSP 8801003c3d58
CR2: 88006d9e8d48

Reviewed vmcore I found the skb-users is 1 at the moment, checked network 
neighbour 
history I found skb_get() be replaced by skb_copy by commit 7e36763b2c:

commit 7e36763b2c204d59de4e88087f84a2c0c8421f25   
Author: Frank Blaschka frank.blasc...@de.ibm.com
Date:   Mon Mar 3 12:16:04 2008 -0800

[NET]: Fix race in generic address resolution.

neigh_update sends skb from neigh-arp_queue while neigh_timer_handler
has increased skbs refcount and calls solicit with the
skb. neigh_timer_handler should not increase skbs refcount but make a 
copy of the skb and do solicit with the copy. 

Signed-off-by: Frank Blaschka frank.blasc...@de.ibm.com 
Signed-off-by: David S. Miller da...@davemloft.net  

So can you please give some details of the race? per vmcore seems like the skb 
data
be freed, I suspected skb_get() lost at somewhere?
I reverted above commit the panic not occurred during our testing.

Any input will appreciate!

Best Regards,
Joe
--
To 

Re: [PATCH] ACPI: update user_policy.max when _PPC updated

2013-06-06 Thread Joe Jin
On 06/07/13 03:54, Rafael J. Wysocki wrote:
> Do you mean you set a limit in the BIOS setup and the kernel changed that 
> limit
> on boot?

Sorry for the confusing.

The issue is when we disable hardcap before kernel boot up, after kernel bring
up, any changes of _PPC will update scaling_max_freq properly.

If we enable hardcap before kernel boot up, after kernel bring up, even we 
disable it, scaling_max_freq does not be updated to max frequency, the max 
frequency just up to the value when bring up.

Review related codes I found the limit came from user_policy.max, means when
we set user_policy.max to 1000MHZ when boot up, then any changes of _PPC could
not enlarge the scaling_max_freq, I think this is not as expected? please 
advise.

Thanks,
Joe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ACPI: update user_policy.max when _PPC updated

2013-06-06 Thread Joe Jin
On 06/06/13 19:06, Rafael J. Wysocki wrote:
> On Thursday, June 06, 2013 08:27:08 AM Joe Jin wrote:
>> On 06/06/13 04:40, Rafael J. Wysocki wrote:
>>> On Wednesday, June 05, 2013 08:52:52 AM Joe Jin wrote:
>>>> When _PPC changed dynamically the user_policy.max will not be updated,
>>>> this prevent CPU run on the highest frequency.
>>>
>>> Why should the user setting be always related to the current maximum 
>>> available
>>> frequency?  What if the user sets the limit for power capping purposes?
>>
>> cpufreq_update_policy() get policy->max from user_policy.max:
>>
>> 1782 int cpufreq_update_policy(unsigned int cpu)
>> 1783 {
>> [...]
>> 1800 policy.min = data->user_policy.min;
>> 1801 policy.max = data->user_policy.max;
>> 1802 policy.policy = data->user_policy.policy;
>> 1803 policy.governor = data->user_policy.governor;
>> [...]
>> 1819 ret = __cpufreq_set_policy(data, );
>> [...]
>>
>> /sys/devices/system/cpu/cpu$/cpufreq/scaling_max_freq using policy->max 
>> and user_policy->max, when update it, so I think _PPC changes also need
>> to update these two?
> 
> Yes, if policy.max happens to be greater that the maximum available frequency,
> then (and only then) it probably should be updated.  It should never be bumped
> up, though.

Does this means if I enabled hardcap before kernel boot up, and later system 
brought
up and I disabled hardcap, I has to enlarge the max frequency manually?

Thanks,
Joe

> 
> Thanks,
> Rafael
> 
> 
>>>> Signed-off-by: Joe Jin 
>>>> Cc: Rafael J. Wysocki 
>>>> Cc: Viresh Kumar 
>>>> ---
>>>>  drivers/acpi/processor_perflib.c | 17 -
>>>>  1 file changed, 16 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/acpi/processor_perflib.c 
>>>> b/drivers/acpi/processor_perflib.c
>>>> index e854582..e01aa7d 100644
>>>> --- a/drivers/acpi/processor_perflib.c
>>>> +++ b/drivers/acpi/processor_perflib.c
>>>> @@ -180,6 +180,7 @@ static void acpi_processor_ppc_ost(acpi_handle handle, 
>>>> int status)
>>>>  int acpi_processor_ppc_has_changed(struct acpi_processor *pr, int 
>>>> event_flag)
>>>>  {
>>>>int ret;
>>>> +  unsigned int saved = (unsigned int)pr->performance_platform_limit;
>>>>  
>>>>if (ignore_ppc) {
>>>>/*
>>>> @@ -204,8 +205,22 @@ int acpi_processor_ppc_has_changed(struct 
>>>> acpi_processor *pr, int event_flag)
>>>>}
>>>>if (ret < 0)
>>>>return (ret);
>>>> -  else
>>>> +  else {
>>>> +  unsigned int ppc = (unsigned int)pr->performance_platform_limit;
>>>> +
>>>> +  if (saved != ppc) {
>>>> +  struct cpufreq_policy *policy;
>>>> +
>>>> +  policy = cpufreq_cpu_get(pr->id);
>>>> +  if (likely(policy))
>>>> +  policy->user_policy.max =
>>>> +  pr->performance->states[ppc].
>>>> +  core_frequency * 1000;
>>>> +  cpufreq_cpu_put(policy);
>>>> +  }
>>>> +
>>>>return cpufreq_update_policy(pr->id);
>>>> +  }
>>>>  }
>>>>  
>>>>  int acpi_processor_get_bios_limit(int cpu, unsigned int *limit)
>>>>
>>
>>
>>


-- 
Oracle <http://www.oracle.com>
Joe Jin | Software Development Senior Manager | +8610.6106.5624
ORACLE | Linux and Virtualization
No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ACPI: update user_policy.max when _PPC updated

2013-06-06 Thread Joe Jin
On 06/06/13 19:06, Rafael J. Wysocki wrote:
 On Thursday, June 06, 2013 08:27:08 AM Joe Jin wrote:
 On 06/06/13 04:40, Rafael J. Wysocki wrote:
 On Wednesday, June 05, 2013 08:52:52 AM Joe Jin wrote:
 When _PPC changed dynamically the user_policy.max will not be updated,
 this prevent CPU run on the highest frequency.

 Why should the user setting be always related to the current maximum 
 available
 frequency?  What if the user sets the limit for power capping purposes?

 cpufreq_update_policy() get policy-max from user_policy.max:

 1782 int cpufreq_update_policy(unsigned int cpu)
 1783 {
 [...]
 1800 policy.min = data-user_policy.min;
 1801 policy.max = data-user_policy.max;
 1802 policy.policy = data-user_policy.policy;
 1803 policy.governor = data-user_policy.governor;
 [...]
 1819 ret = __cpufreq_set_policy(data, policy);
 [...]

 /sys/devices/system/cpu/cpu$/cpufreq/scaling_max_freq using policy-max 
 and user_policy-max, when update it, so I think _PPC changes also need
 to update these two?
 
 Yes, if policy.max happens to be greater that the maximum available frequency,
 then (and only then) it probably should be updated.  It should never be bumped
 up, though.

Does this means if I enabled hardcap before kernel boot up, and later system 
brought
up and I disabled hardcap, I has to enlarge the max frequency manually?

Thanks,
Joe

 
 Thanks,
 Rafael
 
 
 Signed-off-by: Joe Jin joe@oracle.com
 Cc: Rafael J. Wysocki r...@sisk.pl
 Cc: Viresh Kumar viresh.ku...@linaro.org
 ---
  drivers/acpi/processor_perflib.c | 17 -
  1 file changed, 16 insertions(+), 1 deletion(-)

 diff --git a/drivers/acpi/processor_perflib.c 
 b/drivers/acpi/processor_perflib.c
 index e854582..e01aa7d 100644
 --- a/drivers/acpi/processor_perflib.c
 +++ b/drivers/acpi/processor_perflib.c
 @@ -180,6 +180,7 @@ static void acpi_processor_ppc_ost(acpi_handle handle, 
 int status)
  int acpi_processor_ppc_has_changed(struct acpi_processor *pr, int 
 event_flag)
  {
int ret;
 +  unsigned int saved = (unsigned int)pr-performance_platform_limit;
  
if (ignore_ppc) {
/*
 @@ -204,8 +205,22 @@ int acpi_processor_ppc_has_changed(struct 
 acpi_processor *pr, int event_flag)
}
if (ret  0)
return (ret);
 -  else
 +  else {
 +  unsigned int ppc = (unsigned int)pr-performance_platform_limit;
 +
 +  if (saved != ppc) {
 +  struct cpufreq_policy *policy;
 +
 +  policy = cpufreq_cpu_get(pr-id);
 +  if (likely(policy))
 +  policy-user_policy.max =
 +  pr-performance-states[ppc].
 +  core_frequency * 1000;
 +  cpufreq_cpu_put(policy);
 +  }
 +
return cpufreq_update_policy(pr-id);
 +  }
  }
  
  int acpi_processor_get_bios_limit(int cpu, unsigned int *limit)






-- 
Oracle http://www.oracle.com
Joe Jin | Software Development Senior Manager | +8610.6106.5624
ORACLE | Linux and Virtualization
No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ACPI: update user_policy.max when _PPC updated

2013-06-06 Thread Joe Jin
On 06/07/13 03:54, Rafael J. Wysocki wrote:
 Do you mean you set a limit in the BIOS setup and the kernel changed that 
 limit
 on boot?

Sorry for the confusing.

The issue is when we disable hardcap before kernel boot up, after kernel bring
up, any changes of _PPC will update scaling_max_freq properly.

If we enable hardcap before kernel boot up, after kernel bring up, even we 
disable it, scaling_max_freq does not be updated to max frequency, the max 
frequency just up to the value when bring up.

Review related codes I found the limit came from user_policy.max, means when
we set user_policy.max to 1000MHZ when boot up, then any changes of _PPC could
not enlarge the scaling_max_freq, I think this is not as expected? please 
advise.

Thanks,
Joe
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ACPI: update user_policy.max when _PPC updated

2013-06-05 Thread Joe Jin
On 06/06/13 04:40, Rafael J. Wysocki wrote:
> On Wednesday, June 05, 2013 08:52:52 AM Joe Jin wrote:
>> When _PPC changed dynamically the user_policy.max will not be updated,
>> this prevent CPU run on the highest frequency.
> 
> Why should the user setting be always related to the current maximum available
> frequency?  What if the user sets the limit for power capping purposes?

cpufreq_update_policy() get policy->max from user_policy.max:

1782 int cpufreq_update_policy(unsigned int cpu)
1783 {
[...]
1800 policy.min = data->user_policy.min;
1801 policy.max = data->user_policy.max;
1802 policy.policy = data->user_policy.policy;
1803 policy.governor = data->user_policy.governor;
[...]
1819 ret = __cpufreq_set_policy(data, );
[...]

/sys/devices/system/cpu/cpu$/cpufreq/scaling_max_freq using policy->max 
and user_policy->max, when update it, so I think _PPC changes also need
to update these two?

Thanks,
Joe
 
> 
> Rafael
> 
> 
>> Signed-off-by: Joe Jin 
>> Cc: Rafael J. Wysocki 
>> Cc: Viresh Kumar 
>> ---
>>  drivers/acpi/processor_perflib.c | 17 -
>>  1 file changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/acpi/processor_perflib.c 
>> b/drivers/acpi/processor_perflib.c
>> index e854582..e01aa7d 100644
>> --- a/drivers/acpi/processor_perflib.c
>> +++ b/drivers/acpi/processor_perflib.c
>> @@ -180,6 +180,7 @@ static void acpi_processor_ppc_ost(acpi_handle handle, 
>> int status)
>>  int acpi_processor_ppc_has_changed(struct acpi_processor *pr, int 
>> event_flag)
>>  {
>>  int ret;
>> +unsigned int saved = (unsigned int)pr->performance_platform_limit;
>>  
>>  if (ignore_ppc) {
>>  /*
>> @@ -204,8 +205,22 @@ int acpi_processor_ppc_has_changed(struct 
>> acpi_processor *pr, int event_flag)
>>  }
>>  if (ret < 0)
>>  return (ret);
>> -else
>> +else {
>> +unsigned int ppc = (unsigned int)pr->performance_platform_limit;
>> +
>> +if (saved != ppc) {
>> +struct cpufreq_policy *policy;
>> +
>> +policy = cpufreq_cpu_get(pr->id);
>> +if (likely(policy))
>> +policy->user_policy.max =
>> +pr->performance->states[ppc].
>> +            core_frequency * 1000;
>> +cpufreq_cpu_put(policy);
>> +}
>> +
>>  return cpufreq_update_policy(pr->id);
>> +}
>>  }
>>  
>>  int acpi_processor_get_bios_limit(int cpu, unsigned int *limit)
>>


-- 
Oracle <http://www.oracle.com>
Joe Jin | Software Development Senior Manager | +8610.6106.5624
ORACLE | Linux and Virtualization
No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ACPI: update user_policy.max when _PPC updated

2013-06-05 Thread Joe Jin
On 06/06/13 04:40, Rafael J. Wysocki wrote:
 On Wednesday, June 05, 2013 08:52:52 AM Joe Jin wrote:
 When _PPC changed dynamically the user_policy.max will not be updated,
 this prevent CPU run on the highest frequency.
 
 Why should the user setting be always related to the current maximum available
 frequency?  What if the user sets the limit for power capping purposes?

cpufreq_update_policy() get policy-max from user_policy.max:

1782 int cpufreq_update_policy(unsigned int cpu)
1783 {
[...]
1800 policy.min = data-user_policy.min;
1801 policy.max = data-user_policy.max;
1802 policy.policy = data-user_policy.policy;
1803 policy.governor = data-user_policy.governor;
[...]
1819 ret = __cpufreq_set_policy(data, policy);
[...]

/sys/devices/system/cpu/cpu$/cpufreq/scaling_max_freq using policy-max 
and user_policy-max, when update it, so I think _PPC changes also need
to update these two?

Thanks,
Joe
 
 
 Rafael
 
 
 Signed-off-by: Joe Jin joe@oracle.com
 Cc: Rafael J. Wysocki r...@sisk.pl
 Cc: Viresh Kumar viresh.ku...@linaro.org
 ---
  drivers/acpi/processor_perflib.c | 17 -
  1 file changed, 16 insertions(+), 1 deletion(-)

 diff --git a/drivers/acpi/processor_perflib.c 
 b/drivers/acpi/processor_perflib.c
 index e854582..e01aa7d 100644
 --- a/drivers/acpi/processor_perflib.c
 +++ b/drivers/acpi/processor_perflib.c
 @@ -180,6 +180,7 @@ static void acpi_processor_ppc_ost(acpi_handle handle, 
 int status)
  int acpi_processor_ppc_has_changed(struct acpi_processor *pr, int 
 event_flag)
  {
  int ret;
 +unsigned int saved = (unsigned int)pr-performance_platform_limit;
  
  if (ignore_ppc) {
  /*
 @@ -204,8 +205,22 @@ int acpi_processor_ppc_has_changed(struct 
 acpi_processor *pr, int event_flag)
  }
  if (ret  0)
  return (ret);
 -else
 +else {
 +unsigned int ppc = (unsigned int)pr-performance_platform_limit;
 +
 +if (saved != ppc) {
 +struct cpufreq_policy *policy;
 +
 +policy = cpufreq_cpu_get(pr-id);
 +if (likely(policy))
 +policy-user_policy.max =
 +pr-performance-states[ppc].
 +core_frequency * 1000;
 +cpufreq_cpu_put(policy);
 +}
 +
  return cpufreq_update_policy(pr-id);
 +}
  }
  
  int acpi_processor_get_bios_limit(int cpu, unsigned int *limit)



-- 
Oracle http://www.oracle.com
Joe Jin | Software Development Senior Manager | +8610.6106.5624
ORACLE | Linux and Virtualization
No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ACPI: update user_policy.max when _PPC updated

2013-06-04 Thread Joe Jin
When _PPC changed dynamically the user_policy.max will not be updated,
this prevent CPU run on the highest frequency.

Signed-off-by: Joe Jin 
Cc: Rafael J. Wysocki 
Cc: Viresh Kumar 
---
 drivers/acpi/processor_perflib.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c
index e854582..e01aa7d 100644
--- a/drivers/acpi/processor_perflib.c
+++ b/drivers/acpi/processor_perflib.c
@@ -180,6 +180,7 @@ static void acpi_processor_ppc_ost(acpi_handle handle, int 
status)
 int acpi_processor_ppc_has_changed(struct acpi_processor *pr, int event_flag)
 {
int ret;
+   unsigned int saved = (unsigned int)pr->performance_platform_limit;
 
if (ignore_ppc) {
/*
@@ -204,8 +205,22 @@ int acpi_processor_ppc_has_changed(struct acpi_processor 
*pr, int event_flag)
}
if (ret < 0)
return (ret);
-   else
+   else {
+   unsigned int ppc = (unsigned int)pr->performance_platform_limit;
+
+   if (saved != ppc) {
+   struct cpufreq_policy *policy;
+
+   policy = cpufreq_cpu_get(pr->id);
+   if (likely(policy))
+   policy->user_policy.max =
+   pr->performance->states[ppc].
+   core_frequency * 1000;
+   cpufreq_cpu_put(policy);
+   }
+
return cpufreq_update_policy(pr->id);
+   }
 }
 
 int acpi_processor_get_bios_limit(int cpu, unsigned int *limit)
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ACPI: update user_policy.max when _PPC updated

2013-06-04 Thread Joe Jin
When _PPC changed dynamically the user_policy.max will not be updated,
this prevent CPU run on the highest frequency.

Signed-off-by: Joe Jin joe@oracle.com
Cc: Rafael J. Wysocki r...@sisk.pl
Cc: Viresh Kumar viresh.ku...@linaro.org
---
 drivers/acpi/processor_perflib.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c
index e854582..e01aa7d 100644
--- a/drivers/acpi/processor_perflib.c
+++ b/drivers/acpi/processor_perflib.c
@@ -180,6 +180,7 @@ static void acpi_processor_ppc_ost(acpi_handle handle, int 
status)
 int acpi_processor_ppc_has_changed(struct acpi_processor *pr, int event_flag)
 {
int ret;
+   unsigned int saved = (unsigned int)pr-performance_platform_limit;
 
if (ignore_ppc) {
/*
@@ -204,8 +205,22 @@ int acpi_processor_ppc_has_changed(struct acpi_processor 
*pr, int event_flag)
}
if (ret  0)
return (ret);
-   else
+   else {
+   unsigned int ppc = (unsigned int)pr-performance_platform_limit;
+
+   if (saved != ppc) {
+   struct cpufreq_policy *policy;
+
+   policy = cpufreq_cpu_get(pr-id);
+   if (likely(policy))
+   policy-user_policy.max =
+   pr-performance-states[ppc].
+   core_frequency * 1000;
+   cpufreq_cpu_put(policy);
+   }
+
return cpufreq_update_policy(pr-id);
+   }
 }
 
 int acpi_processor_get_bios_limit(int cpu, unsigned int *limit)
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang

2012-12-18 Thread Joe Jin
Hi Yijing,

Thanks for your reference, the patch looks good for me, but I have no chance
to test it on customer's env.

Best Regards,
Joe

On 12/19/12 13:52, Yijing Wang wrote:
> On 2012/12/19 11:04, Joe Jin wrote:
>> Hi all,
>>
>> I backported mps commits and ask customer pass "pci=pcie_bus_peer2pee" to 
>> kernel
>> to limited MPS to 128 and issue disappeared, sound like this is a BIOS bug.
>>
> 
> Hi Joe,
>I found similar problem when I do pci hotplug, discussion is 
> here:http://marc.info/?l=linux-pci=134810569924220=2.
> We try to improve Linux kernel to debug this problem easily based Bjorn's 
> suggestion. Jon sent out the first version patch 
> http://marc.info/?l=linux-pci=135002016005274=2.
> I think we can do further here, 
> http://marc.info/?l=linux-pci=135115581307869=2. I hope this information 
> can help you.
> 
> Thanks!
> Yijing.
> 
>> Thanks all of your help.
>>
>> Best Regards,
>> Joe
>>
>> On 11/29/12 23:52, Fujinaka, Todd wrote:
>>> Someone else pointed this out to me locally. If you have a non-client BIOS, 
>>> you should be able to set the MaxPayloadSize using setpci. You have to make 
>>> sure that you're being consistent throughout all the associated links.
>>>
>>> Todd Fujinaka
>>> Technical Marketing Engineer
>>> LAN Access Division (LAD)
>>> Intel Corporation
>>> todd.fujin...@intel.com
>>> (503) 712-4565
>>>
>>>
>>> -Original Message-
>>> From: Ethan Zhao [mailto:ethan.ker...@gmail.com] 
>>> Sent: Wednesday, November 28, 2012 7:10 PM
>>> To: Fujinaka, Todd
>>> Cc: Joe Jin; Ben Hutchings; Mary Mcgrath; net...@vger.kernel.org; 
>>> e1000-de...@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci
>>> Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
>>>
>>> Joe,
>>> Possibly your customer is running a kernel without source code on a 
>>> platform whose vendor wouldn't like to fix BIOS issue( Is that a HP/Dell 
>>> server ?).
>>> Anyway, to see if is a payload issue or,  you could change the payload 
>>> size with setpci tool to those devices and set the link retrain bit to 
>>> trigger the link retraining to debug the issue and identity the root cause. 
>>>  I thinks it is much easier than modify the BIOS or  eeprom of NIC.
>>>
>>> e.g.
>>>set device control register to 0f 00   (128 bytes payload size)
>>>#   setpci -v -s 00:02.0 98.w=000f
>>>set device link control register to 60h (retrain the link)
>>>#  setpci -v -s 00:02.0 a0.b=60
>>>
>>>   Hope it works,  Just my 2 cents.
>>>
>>> ethan.z...@oracle.com
>>>
>>> On Wed, Nov 28, 2012 at 11:53 PM, Fujinaka, Todd  
>>> wrote:
>>>> The only EEPROM I know about or can speak to is the one attached to the 
>>>> 82571 and it doesn't set the MaxPayloadSize. That's done by the BIOS.
>>>>
>>>> Todd Fujinaka
>>>> Technical Marketing Engineer
>>>> LAN Access Division (LAD)
>>>> Intel Corporation
>>>> todd.fujin...@intel.com
>>>> (503) 712-4565
>>>>
>>>>
>>>> -Original Message-
>>>> From: Joe Jin [mailto:joe@oracle.com]
>>>> Sent: Wednesday, November 28, 2012 12:31 AM
>>>> To: Ben Hutchings
>>>> Cc: Fujinaka, Todd; Mary Mcgrath; net...@vger.kernel.org; 
>>>> e1000-de...@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci
>>>> Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
>>>>
>>>> On 11/28/12 02:10, Ben Hutchings wrote:
>>>>> On Tue, 2012-11-27 at 17:32 +, Fujinaka, Todd wrote:
>>>>>> Forgive me if I'm being too repetitious as I think some of this has 
>>>>>> been mentioned in the past.
>>>>>>
>>>>>> We (and by we I mean the Ethernet part and driver) can only change 
>>>>>> the advertised availability of a larger MaxPayloadSize. The size is 
>>>>>> negotiated by both sides of the link when the link is established.
>>>>>> The driver should not change the size of the link as it would be 
>>>>>> poking at registers outside of its scope and is controlled by the 
>>>>>> upstream bridge (not us).
>>>>> [...]
>>>>>
>>>>> MaxPayloadSize (MPS) is not negotiated between devices but is 
>>>>> programm

Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang

2012-12-18 Thread Joe Jin
Hi all,

I backported mps commits and ask customer pass "pci=pcie_bus_peer2pee" to kernel
to limited MPS to 128 and issue disappeared, sound like this is a BIOS bug.

Thanks all of your help.

Best Regards,
Joe

On 11/29/12 23:52, Fujinaka, Todd wrote:
> Someone else pointed this out to me locally. If you have a non-client BIOS, 
> you should be able to set the MaxPayloadSize using setpci. You have to make 
> sure that you're being consistent throughout all the associated links.
> 
> Todd Fujinaka
> Technical Marketing Engineer
> LAN Access Division (LAD)
> Intel Corporation
> todd.fujin...@intel.com
> (503) 712-4565
> 
> 
> -Original Message-
> From: Ethan Zhao [mailto:ethan.ker...@gmail.com] 
> Sent: Wednesday, November 28, 2012 7:10 PM
> To: Fujinaka, Todd
> Cc: Joe Jin; Ben Hutchings; Mary Mcgrath; net...@vger.kernel.org; 
> e1000-de...@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci
> Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
> 
> Joe,
> Possibly your customer is running a kernel without source code on a 
> platform whose vendor wouldn't like to fix BIOS issue( Is that a HP/Dell 
> server ?).
> Anyway, to see if is a payload issue or,  you could change the payload 
> size with setpci tool to those devices and set the link retrain bit to 
> trigger the link retraining to debug the issue and identity the root cause.  
> I thinks it is much easier than modify the BIOS or  eeprom of NIC.
> 
> e.g.
>set device control register to 0f 00   (128 bytes payload size)
>#   setpci -v -s 00:02.0 98.w=000f
>set device link control register to 60h (retrain the link)
>#  setpci -v -s 00:02.0 a0.b=60
> 
>   Hope it works,  Just my 2 cents.
> 
> ethan.z...@oracle.com
> 
> On Wed, Nov 28, 2012 at 11:53 PM, Fujinaka, Todd  
> wrote:
>> The only EEPROM I know about or can speak to is the one attached to the 
>> 82571 and it doesn't set the MaxPayloadSize. That's done by the BIOS.
>>
>> Todd Fujinaka
>> Technical Marketing Engineer
>> LAN Access Division (LAD)
>> Intel Corporation
>> todd.fujin...@intel.com
>> (503) 712-4565
>>
>>
>> -Original Message-
>> From: Joe Jin [mailto:joe@oracle.com]
>> Sent: Wednesday, November 28, 2012 12:31 AM
>> To: Ben Hutchings
>> Cc: Fujinaka, Todd; Mary Mcgrath; net...@vger.kernel.org; 
>> e1000-de...@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci
>> Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
>>
>> On 11/28/12 02:10, Ben Hutchings wrote:
>>> On Tue, 2012-11-27 at 17:32 +, Fujinaka, Todd wrote:
>>>> Forgive me if I'm being too repetitious as I think some of this has 
>>>> been mentioned in the past.
>>>>
>>>> We (and by we I mean the Ethernet part and driver) can only change 
>>>> the advertised availability of a larger MaxPayloadSize. The size is 
>>>> negotiated by both sides of the link when the link is established.
>>>> The driver should not change the size of the link as it would be 
>>>> poking at registers outside of its scope and is controlled by the 
>>>> upstream bridge (not us).
>>> [...]
>>>
>>> MaxPayloadSize (MPS) is not negotiated between devices but is 
>>> programmed by the system firmware (at least for devices present at 
>>> boot - the kernel may be responsible in case of hotplug).  You can 
>>> use the kernel parameter 'pci=pcie_bus_perf' (or one of several 
>>> others) to set a policy that overrides this, but no policy will allow 
>>> setting MPS above the device's MaxPayloadSizeSupported (MPSS).
>>>
>>
>> Ben,
>>
>> Unfortunately I'm using 3.0.x kernel and this is not included in the kernel.
>> So I'm trying to use ethtool modify it from eeprom to see if help or no.
>>
>>
>> Todd, I'll review all MaxPayload for all devices, but need to say if it 
>> mismatch, customer could not modify it from BIOS for there was not entry at 
>> there, to test it, we have to find how to verify if this is the root cause, 
>> so still need to find the offset in eeprom.
>>
>> Thanks in advance,
>> Joe
>>


-- 
Oracle <http://www.oracle.com>
Joe Jin | Software Development Senior Manager | +8610.6106.5624
ORACLE | Linux and Virtualization
No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang

2012-12-18 Thread Joe Jin
Hi all,

I backported mps commits and ask customer pass pci=pcie_bus_peer2pee to kernel
to limited MPS to 128 and issue disappeared, sound like this is a BIOS bug.

Thanks all of your help.

Best Regards,
Joe

On 11/29/12 23:52, Fujinaka, Todd wrote:
 Someone else pointed this out to me locally. If you have a non-client BIOS, 
 you should be able to set the MaxPayloadSize using setpci. You have to make 
 sure that you're being consistent throughout all the associated links.
 
 Todd Fujinaka
 Technical Marketing Engineer
 LAN Access Division (LAD)
 Intel Corporation
 todd.fujin...@intel.com
 (503) 712-4565
 
 
 -Original Message-
 From: Ethan Zhao [mailto:ethan.ker...@gmail.com] 
 Sent: Wednesday, November 28, 2012 7:10 PM
 To: Fujinaka, Todd
 Cc: Joe Jin; Ben Hutchings; Mary Mcgrath; net...@vger.kernel.org; 
 e1000-de...@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci
 Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
 
 Joe,
 Possibly your customer is running a kernel without source code on a 
 platform whose vendor wouldn't like to fix BIOS issue( Is that a HP/Dell 
 server ?).
 Anyway, to see if is a payload issue or,  you could change the payload 
 size with setpci tool to those devices and set the link retrain bit to 
 trigger the link retraining to debug the issue and identity the root cause.  
 I thinks it is much easier than modify the BIOS or  eeprom of NIC.
 
 e.g.
set device control register to 0f 00   (128 bytes payload size)
#   setpci -v -s 00:02.0 98.w=000f
set device link control register to 60h (retrain the link)
#  setpci -v -s 00:02.0 a0.b=60
 
   Hope it works,  Just my 2 cents.
 
 ethan.z...@oracle.com
 
 On Wed, Nov 28, 2012 at 11:53 PM, Fujinaka, Todd todd.fujin...@intel.com 
 wrote:
 The only EEPROM I know about or can speak to is the one attached to the 
 82571 and it doesn't set the MaxPayloadSize. That's done by the BIOS.

 Todd Fujinaka
 Technical Marketing Engineer
 LAN Access Division (LAD)
 Intel Corporation
 todd.fujin...@intel.com
 (503) 712-4565


 -Original Message-
 From: Joe Jin [mailto:joe@oracle.com]
 Sent: Wednesday, November 28, 2012 12:31 AM
 To: Ben Hutchings
 Cc: Fujinaka, Todd; Mary Mcgrath; net...@vger.kernel.org; 
 e1000-de...@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci
 Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang

 On 11/28/12 02:10, Ben Hutchings wrote:
 On Tue, 2012-11-27 at 17:32 +, Fujinaka, Todd wrote:
 Forgive me if I'm being too repetitious as I think some of this has 
 been mentioned in the past.

 We (and by we I mean the Ethernet part and driver) can only change 
 the advertised availability of a larger MaxPayloadSize. The size is 
 negotiated by both sides of the link when the link is established.
 The driver should not change the size of the link as it would be 
 poking at registers outside of its scope and is controlled by the 
 upstream bridge (not us).
 [...]

 MaxPayloadSize (MPS) is not negotiated between devices but is 
 programmed by the system firmware (at least for devices present at 
 boot - the kernel may be responsible in case of hotplug).  You can 
 use the kernel parameter 'pci=pcie_bus_perf' (or one of several 
 others) to set a policy that overrides this, but no policy will allow 
 setting MPS above the device's MaxPayloadSizeSupported (MPSS).


 Ben,

 Unfortunately I'm using 3.0.x kernel and this is not included in the kernel.
 So I'm trying to use ethtool modify it from eeprom to see if help or no.


 Todd, I'll review all MaxPayload for all devices, but need to say if it 
 mismatch, customer could not modify it from BIOS for there was not entry at 
 there, to test it, we have to find how to verify if this is the root cause, 
 so still need to find the offset in eeprom.

 Thanks in advance,
 Joe



-- 
Oracle http://www.oracle.com
Joe Jin | Software Development Senior Manager | +8610.6106.5624
ORACLE | Linux and Virtualization
No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang

2012-12-18 Thread Joe Jin
Hi Yijing,

Thanks for your reference, the patch looks good for me, but I have no chance
to test it on customer's env.

Best Regards,
Joe

On 12/19/12 13:52, Yijing Wang wrote:
 On 2012/12/19 11:04, Joe Jin wrote:
 Hi all,

 I backported mps commits and ask customer pass pci=pcie_bus_peer2pee to 
 kernel
 to limited MPS to 128 and issue disappeared, sound like this is a BIOS bug.

 
 Hi Joe,
I found similar problem when I do pci hotplug, discussion is 
 here:http://marc.info/?l=linux-pcim=134810569924220w=2.
 We try to improve Linux kernel to debug this problem easily based Bjorn's 
 suggestion. Jon sent out the first version patch 
 http://marc.info/?l=linux-pcim=135002016005274w=2.
 I think we can do further here, 
 http://marc.info/?l=linux-pcim=135115581307869w=2. I hope this information 
 can help you.
 
 Thanks!
 Yijing.
 
 Thanks all of your help.

 Best Regards,
 Joe

 On 11/29/12 23:52, Fujinaka, Todd wrote:
 Someone else pointed this out to me locally. If you have a non-client BIOS, 
 you should be able to set the MaxPayloadSize using setpci. You have to make 
 sure that you're being consistent throughout all the associated links.

 Todd Fujinaka
 Technical Marketing Engineer
 LAN Access Division (LAD)
 Intel Corporation
 todd.fujin...@intel.com
 (503) 712-4565


 -Original Message-
 From: Ethan Zhao [mailto:ethan.ker...@gmail.com] 
 Sent: Wednesday, November 28, 2012 7:10 PM
 To: Fujinaka, Todd
 Cc: Joe Jin; Ben Hutchings; Mary Mcgrath; net...@vger.kernel.org; 
 e1000-de...@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci
 Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang

 Joe,
 Possibly your customer is running a kernel without source code on a 
 platform whose vendor wouldn't like to fix BIOS issue( Is that a HP/Dell 
 server ?).
 Anyway, to see if is a payload issue or,  you could change the payload 
 size with setpci tool to those devices and set the link retrain bit to 
 trigger the link retraining to debug the issue and identity the root cause. 
  I thinks it is much easier than modify the BIOS or  eeprom of NIC.

 e.g.
set device control register to 0f 00   (128 bytes payload size)
#   setpci -v -s 00:02.0 98.w=000f
set device link control register to 60h (retrain the link)
#  setpci -v -s 00:02.0 a0.b=60

   Hope it works,  Just my 2 cents.

 ethan.z...@oracle.com

 On Wed, Nov 28, 2012 at 11:53 PM, Fujinaka, Todd todd.fujin...@intel.com 
 wrote:
 The only EEPROM I know about or can speak to is the one attached to the 
 82571 and it doesn't set the MaxPayloadSize. That's done by the BIOS.

 Todd Fujinaka
 Technical Marketing Engineer
 LAN Access Division (LAD)
 Intel Corporation
 todd.fujin...@intel.com
 (503) 712-4565


 -Original Message-
 From: Joe Jin [mailto:joe@oracle.com]
 Sent: Wednesday, November 28, 2012 12:31 AM
 To: Ben Hutchings
 Cc: Fujinaka, Todd; Mary Mcgrath; net...@vger.kernel.org; 
 e1000-de...@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci
 Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang

 On 11/28/12 02:10, Ben Hutchings wrote:
 On Tue, 2012-11-27 at 17:32 +, Fujinaka, Todd wrote:
 Forgive me if I'm being too repetitious as I think some of this has 
 been mentioned in the past.

 We (and by we I mean the Ethernet part and driver) can only change 
 the advertised availability of a larger MaxPayloadSize. The size is 
 negotiated by both sides of the link when the link is established.
 The driver should not change the size of the link as it would be 
 poking at registers outside of its scope and is controlled by the 
 upstream bridge (not us).
 [...]

 MaxPayloadSize (MPS) is not negotiated between devices but is 
 programmed by the system firmware (at least for devices present at 
 boot - the kernel may be responsible in case of hotplug).  You can 
 use the kernel parameter 'pci=pcie_bus_perf' (or one of several 
 others) to set a policy that overrides this, but no policy will allow 
 setting MPS above the device's MaxPayloadSizeSupported (MPSS).


 Ben,

 Unfortunately I'm using 3.0.x kernel and this is not included in the 
 kernel.
 So I'm trying to use ethtool modify it from eeprom to see if help or no.


 Todd, I'll review all MaxPayload for all devices, but need to say if it 
 mismatch, customer could not modify it from BIOS for there was not entry 
 at there, to test it, we have to find how to verify if this is the root 
 cause, so still need to find the offset in eeprom.

 Thanks in advance,
 Joe



 
 


-- 
Oracle http://www.oracle.com
Joe Jin | Software Development Senior Manager | +8610.6106.5624
ORACLE | Linux and Virtualization
No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang

2012-11-28 Thread Joe Jin
On 11/28/12 02:10, Ben Hutchings wrote:
> On Tue, 2012-11-27 at 17:32 +, Fujinaka, Todd wrote:
>> Forgive me if I'm being too repetitious as I think some of this has
>> been mentioned in the past.
>>
>> We (and by we I mean the Ethernet part and driver) can only change the
>> advertised availability of a larger MaxPayloadSize. The size is
>> negotiated by both sides of the link when the link is established. The
>> driver should not change the size of the link as it would be poking at
>> registers outside of its scope and is controlled by the upstream
>> bridge (not us).
> [...]
> 
> MaxPayloadSize (MPS) is not negotiated between devices but is programmed
> by the system firmware (at least for devices present at boot - the
> kernel may be responsible in case of hotplug).  You can use the kernel
> parameter 'pci=pcie_bus_perf' (or one of several others) to set a policy
> that overrides this, but no policy will allow setting MPS above the
> device's MaxPayloadSizeSupported (MPSS).
> 

Ben,

Unfortunately I'm using 3.0.x kernel and this is not included in the kernel.
So I'm trying to use ethtool modify it from eeprom to see if help or no.


Todd, I'll review all MaxPayload for all devices, but need to say if it 
mismatch,
customer could not modify it from BIOS for there was not entry at there, to
test it, we have to find how to verify if this is the root cause, so still 
need to find the offset in eeprom.

Thanks in advance,
Joe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang

2012-11-28 Thread Joe Jin
On 11/28/12 02:10, Ben Hutchings wrote:
 On Tue, 2012-11-27 at 17:32 +, Fujinaka, Todd wrote:
 Forgive me if I'm being too repetitious as I think some of this has
 been mentioned in the past.

 We (and by we I mean the Ethernet part and driver) can only change the
 advertised availability of a larger MaxPayloadSize. The size is
 negotiated by both sides of the link when the link is established. The
 driver should not change the size of the link as it would be poking at
 registers outside of its scope and is controlled by the upstream
 bridge (not us).
 [...]
 
 MaxPayloadSize (MPS) is not negotiated between devices but is programmed
 by the system firmware (at least for devices present at boot - the
 kernel may be responsible in case of hotplug).  You can use the kernel
 parameter 'pci=pcie_bus_perf' (or one of several others) to set a policy
 that overrides this, but no policy will allow setting MPS above the
 device's MaxPayloadSizeSupported (MPSS).
 

Ben,

Unfortunately I'm using 3.0.x kernel and this is not included in the kernel.
So I'm trying to use ethtool modify it from eeprom to see if help or no.


Todd, I'll review all MaxPayload for all devices, but need to say if it 
mismatch,
customer could not modify it from BIOS for there was not entry at there, to
test it, we have to find how to verify if this is the root cause, so still 
need to find the offset in eeprom.

Thanks in advance,
Joe

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang

2012-11-26 Thread Joe Jin
On 11/27/12 00:23, Fujinaka, Todd wrote:
> If you look at the previous section, DevCap, you'll see that it's
> correctly advertising 256 bytes but the system is negotiating 128 for
> the link to the Ethernet controller. Things on the "other" side of the
> link are controlled outside of the e1000 driver.
> 
> Tushar's first suggestion was to check the PCIe payload settings in the
> entire chain. Have you done that? Mismatches will cause hangs.

Hi Todd,

So far I had to know how to modify the maxpayload size, since BIOS have not
entry to change this, so I had to use ethtool, now I need to get the offset
of MaxPayload size in eeprom, I ever tried to find from Intel online document
but failed, any idea?

Thanks in advance,
Joe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang

2012-11-26 Thread Joe Jin
On 11/27/12 00:23, Fujinaka, Todd wrote:
 If you look at the previous section, DevCap, you'll see that it's
 correctly advertising 256 bytes but the system is negotiating 128 for
 the link to the Ethernet controller. Things on the other side of the
 link are controlled outside of the e1000 driver.
 
 Tushar's first suggestion was to check the PCIe payload settings in the
 entire chain. Have you done that? Mismatches will cause hangs.

Hi Todd,

So far I had to know how to modify the maxpayload size, since BIOS have not
entry to change this, so I had to use ethtool, now I need to get the offset
of MaxPayload size in eeprom, I ever tried to find from Intel online document
but failed, any idea?

Thanks in advance,
Joe
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 82571EB: Detected Hardware Unit Hang

2012-11-20 Thread Joe Jin
On 11/20/12 16:59, Dave, Tushar N wrote:
> Have you power off the system completely after modifying eeprom? If not 
> please do so.

Hi Tushar,

Seems not works for me, would you please help to check what is wrong of my 
operations?

Original eeprom dump:

# ethtool -e eth3 | head -8
Offset  Values
--  --
0x  00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 
0x0010  57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 
0x0020  08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 
0x0030  f6 6c b0 37 a6 07 03 84 83 07 00 00 03 c3 02 06 
^
0x0040  08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 
0x0050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

# lspci -s :52:00.1 -vvv
52:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (rev 06)
<--snip-->
Capabilities: [e0] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, 
L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ 
Unsupported+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 4096 bytes
^
<--snip-->

# ethtool eth3
Settings for eth3:
Supported ports: [ TP ]
Supported link modes:   10baseT/Half 10baseT/Full 
100baseT/Half 100baseT/Full 
1000baseT/Full 
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full 
100baseT/Half 100baseT/Full 
1000baseT/Full 
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
MDI-X: off
Supports Wake-on: d
Wake-on: d
Current message level: 0x0007 (7)
Link detected: yes

# ethtool -E eth3 magic 0x10a48086 offset 0x34 value 0xa7
# ethtool -e eth3 | head -8
Offset  Values
--  --
0x  00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 
0x0010  57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 
0x0020  08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 
0x0030  f6 6c b0 37 a7 07 03 84 83 07 00 00 03 c3 02 06 
^ <== a6 --> a7
0x0040  08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 
0x0050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

# reboot

# ethtool -e eth3 | head -8
Offset  Values
--  --
0x  00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 
0x0010  57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 
0x0020  08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 
0x0030  f6 6c b0 37 a7 07 03 84 83 07 00 00 03 c3 02 06 
0x0040  08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 
0x0050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

# lspci -s :52:00.1 -vvv
52:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (rev 06)
<--snip-->
Capabilities: [e0] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, 
L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ 
Unsupported+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 4096 bytes
^
DevSta: CorrErr- UncorrErr- FatalErr+ UnsuppReq+ AuxPwr- 
TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s, Latency L0 
<4us, L1 <64us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ 
DLActive- BWMgmt- ABWMgmt-
<--snip-->

#  ethtool -E eth3 magic 0x10a48086 offset 0x35 value 0x17

# ethtool -e eth3 | head -8
Offset  Values
--  --
0x  00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 
0x0010  57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 
0x0020  08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 
0x0030  f6 6c b0 37 a6 17 03 84 83 07 00 00 03 c3 02 06 
^<== 07 -> 17
0x0040  08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 
0x0050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

# reboot

# ethtool -e eth3 | head -8
Offset  Values

Re: 82571EB: Detected Hardware Unit Hang

2012-11-20 Thread Joe Jin
On 11/20/12 16:59, Dave, Tushar N wrote:
> Have you power off the system completely after modifying eeprom? If not 
> please do so.

seems not works for me, would you please help to check what is wrong of my 
operations?

Original eeprom dump:

# ethtool -e eth3 | head -8
Offset  Values
--  --
0x  00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 
0x0010  57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 
0x0020  08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 
0x0030  f6 6c b0 37 a6 07 03 84 83 07 00 00 03 c3 02 06 
^
0x0040  08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 
0x0050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

# lspci -s :52:00.1 -vvv
52:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (rev 06)
<--snip-->
Capabilities: [e0] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, 
L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ 
Unsupported+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 4096 bytes
^
<--snip-->

# ethtool eth3
Settings for eth3:
Supported ports: [ TP ]
Supported link modes:   10baseT/Half 10baseT/Full 
100baseT/Half 100baseT/Full 
1000baseT/Full 
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full 
100baseT/Half 100baseT/Full 
1000baseT/Full 
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
MDI-X: off
Supports Wake-on: d
Wake-on: d
Current message level: 0x0007 (7)
Link detected: yes

# ethtool -E eth3 magic 0x10a48086 offset 0x34 value 0xa7
# ethtool -e eth3 | head -8
Offset  Values
--  --
0x  00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 
0x0010  57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 
0x0020  08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 
0x0030  f6 6c b0 37 a7 07 03 84 83 07 00 00 03 c3 02 06 
^ <== a6 --> a7
0x0040  08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 
0x0050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

# reboot

# ethtool -e eth3 | head -8
Offset  Values
--  --
0x  00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 
0x0010  57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 
0x0020  08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 
0x0030  f6 6c b0 37 a7 07 03 84 83 07 00 00 03 c3 02 06 
0x0040  08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 
0x0050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

# lspci -s :52:00.1 -vvv
52:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (rev 06)
<--snip-->
Capabilities: [e0] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, 
L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ 
Unsupported+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 4096 bytes
^
DevSta: CorrErr- UncorrErr- FatalErr+ UnsuppReq+ AuxPwr- 
TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s, Latency L0 
<4us, L1 <64us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ 
DLActive- BWMgmt- ABWMgmt-
<--snip-->

#  ethtool -E eth3 magic 0x10a48086 offset 0x35 value 0x17

# ethtool -e eth3 | head -8
Offset  Values
--  --
0x  00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 
0x0010  57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 
0x0020  08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 
0x0030  f6 6c b0 37 a6 17 03 84 83 07 00 00 03 c3 02 06 
^<== 07 -> 17
0x0040  08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 
0x0050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

# reboot

# ethtool -e eth3 | head -8
Offset  Values
--  

Re: 82571EB: Detected Hardware Unit Hang

2012-11-20 Thread Joe Jin
On 11/20/12 16:59, Dave, Tushar N wrote:
 Have you power off the system completely after modifying eeprom? If not 
 please do so.

seems not works for me, would you please help to check what is wrong of my 
operations?

Original eeprom dump:

# ethtool -e eth3 | head -8
Offset  Values
--  --
0x  00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 
0x0010  57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 
0x0020  08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 
0x0030  f6 6c b0 37 a6 07 03 84 83 07 00 00 03 c3 02 06 
^
0x0040  08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 
0x0050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

# lspci -s :52:00.1 -vvv
52:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (rev 06)
--snip--
Capabilities: [e0] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s 512ns, 
L1 64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ 
Unsupported+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 4096 bytes
^
--snip--

# ethtool eth3
Settings for eth3:
Supported ports: [ TP ]
Supported link modes:   10baseT/Half 10baseT/Full 
100baseT/Half 100baseT/Full 
1000baseT/Full 
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full 
100baseT/Half 100baseT/Full 
1000baseT/Full 
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
MDI-X: off
Supports Wake-on: d
Wake-on: d
Current message level: 0x0007 (7)
Link detected: yes

# ethtool -E eth3 magic 0x10a48086 offset 0x34 value 0xa7
# ethtool -e eth3 | head -8
Offset  Values
--  --
0x  00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 
0x0010  57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 
0x0020  08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 
0x0030  f6 6c b0 37 a7 07 03 84 83 07 00 00 03 c3 02 06 
^ == a6 -- a7
0x0040  08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 
0x0050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

# reboot

# ethtool -e eth3 | head -8
Offset  Values
--  --
0x  00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 
0x0010  57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 
0x0020  08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 
0x0030  f6 6c b0 37 a7 07 03 84 83 07 00 00 03 c3 02 06 
0x0040  08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 
0x0050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

# lspci -s :52:00.1 -vvv
52:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (rev 06)
--snip--
Capabilities: [e0] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s 512ns, 
L1 64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ 
Unsupported+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 4096 bytes
^
DevSta: CorrErr- UncorrErr- FatalErr+ UnsuppReq+ AuxPwr- 
TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s, Latency L0 
4us, L1 64us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ 
DLActive- BWMgmt- ABWMgmt-
--snip--

#  ethtool -E eth3 magic 0x10a48086 offset 0x35 value 0x17

# ethtool -e eth3 | head -8
Offset  Values
--  --
0x  00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 
0x0010  57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 
0x0020  08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 
0x0030  f6 6c b0 37 a6 17 03 84 83 07 00 00 03 c3 02 06 
^== 07 - 17
0x0040  08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 
0x0050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

# reboot

# ethtool -e eth3 | head -8
Offset  Values
--  --
0x

Re: 82571EB: Detected Hardware Unit Hang

2012-11-20 Thread Joe Jin
On 11/20/12 16:59, Dave, Tushar N wrote:
 Have you power off the system completely after modifying eeprom? If not 
 please do so.

Hi Tushar,

Seems not works for me, would you please help to check what is wrong of my 
operations?

Original eeprom dump:

# ethtool -e eth3 | head -8
Offset  Values
--  --
0x  00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 
0x0010  57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 
0x0020  08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 
0x0030  f6 6c b0 37 a6 07 03 84 83 07 00 00 03 c3 02 06 
^
0x0040  08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 
0x0050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

# lspci -s :52:00.1 -vvv
52:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (rev 06)
--snip--
Capabilities: [e0] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s 512ns, 
L1 64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ 
Unsupported+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 4096 bytes
^
--snip--

# ethtool eth3
Settings for eth3:
Supported ports: [ TP ]
Supported link modes:   10baseT/Half 10baseT/Full 
100baseT/Half 100baseT/Full 
1000baseT/Full 
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full 
100baseT/Half 100baseT/Full 
1000baseT/Full 
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
MDI-X: off
Supports Wake-on: d
Wake-on: d
Current message level: 0x0007 (7)
Link detected: yes

# ethtool -E eth3 magic 0x10a48086 offset 0x34 value 0xa7
# ethtool -e eth3 | head -8
Offset  Values
--  --
0x  00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 
0x0010  57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 
0x0020  08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 
0x0030  f6 6c b0 37 a7 07 03 84 83 07 00 00 03 c3 02 06 
^ == a6 -- a7
0x0040  08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 
0x0050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

# reboot

# ethtool -e eth3 | head -8
Offset  Values
--  --
0x  00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 
0x0010  57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 
0x0020  08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 
0x0030  f6 6c b0 37 a7 07 03 84 83 07 00 00 03 c3 02 06 
0x0040  08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 
0x0050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

# lspci -s :52:00.1 -vvv
52:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (rev 06)
--snip--
Capabilities: [e0] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s 512ns, 
L1 64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ 
Unsupported+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 4096 bytes
^
DevSta: CorrErr- UncorrErr- FatalErr+ UnsuppReq+ AuxPwr- 
TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s, Latency L0 
4us, L1 64us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ 
DLActive- BWMgmt- ABWMgmt-
--snip--

#  ethtool -E eth3 magic 0x10a48086 offset 0x35 value 0x17

# ethtool -e eth3 | head -8
Offset  Values
--  --
0x  00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 
0x0010  57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 
0x0020  08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 
0x0030  f6 6c b0 37 a6 17 03 84 83 07 00 00 03 c3 02 06 
^== 07 - 17
0x0040  08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 
0x0050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

# reboot

# ethtool -e eth3 | head -8
Offset  Values
--  --

Re: 82571EB: Detected Hardware Unit Hang

2012-11-18 Thread Joe Jin
On 11/16/12 04:26, Dave, Tushar N wrote:
>> Would you please help to fine the offset of max payload size in eeprom?
>> I'd like to have a try to modify it by ethtool.
> 
> It is defined using bit 8 of word 0x1A.
> Bit value 0 = 128B , bit value 1 = 256B

Hi Tushar,

I checked one of my server which Max Payload Size is 128:

# lspci -vvv -s 52:00.1
52:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (rev 06)
Subsystem: Intel Corporation PRO/1000 PT Quad Port Server Adapter
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 82571EB: Detected Hardware Unit Hang

2012-11-18 Thread Joe Jin
On 11/16/12 04:26, Dave, Tushar N wrote:
 Would you please help to fine the offset of max payload size in eeprom?
 I'd like to have a try to modify it by ethtool.
 
 It is defined using bit 8 of word 0x1A.
 Bit value 0 = 128B , bit value 1 = 256B

Hi Tushar,

I checked one of my server which Max Payload Size is 128:

# lspci -vvv -s 52:00.1
52:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (rev 06)
Subsystem: Intel Corporation PRO/1000 PT Quad Port Server Adapter
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- 
MAbort- SERR- PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin B routed to IRQ 266
Region 0: Memory at dfea (32-bit, non-prefetchable) [size=128K]
Region 1: Memory at dfe8 (32-bit, non-prefetchable) [size=128K]
Region 2: I/O ports at 6020 [size=32]
[virtual] Expansion ROM at d812 [disabled] [size=128K]
Capabilities: [c8] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA 
PME(D0+,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: fee0  Data: 409a
Capabilities: [e0] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s 512ns, 
L1 64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ 
Unsupported+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr- UncorrErr- FatalErr+ UnsuppReq+ AuxPwr- 
TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s, Latency L0 
4us, L1 64us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ 
DLActive- BWMgmt- ABWMgmt-
Capabilities: [100 v1] Advanced Error Reporting
UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
UESvrt: DLP+ SDES- TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt+ 
RxOF+ MalfTLP+ ECRC- UnsupReq+ ACSViol-
CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk:  RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr-
AERCap: First Error Pointer: 14, GenCap- CGenEn- ChkCap- ChkEn-
Capabilities: [140 v1] Device Serial Number 00-15-17-ff-ff-16-ed-86
Kernel driver in use: e1000e
Kernel modules: e1000e

And eeprom dump as below:

Offset  Values
--  --
0x  00 15 17 16 ed 86 24 05 ff ff a2 50 ff ff ff ff 
0x0010  57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 
0x0020  08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 
0x0030  f6 6c b0 37 a6 07 03 84 83 07 00 00 03 c3 02 06 
0x0040  08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 
0x0050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
0x0060  00 01 00 40 1e 12 07 40 00 01 00 40 ff ff ff ff 


If I did not misunderstand, the value of offset 0x1a is 0x07a6, then the bit 8 
is 1, but 
my NIC's MPS is 128b, anything I'm wrong? 

Thanks,
Joe

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 82571EB: Detected Hardware Unit Hang

2012-11-14 Thread Joe Jin
On 11/14/12 11:45, Dave, Tushar N wrote:
>> -Original Message-
>> From: Joe Jin [mailto:joe@oracle.com]
>> Sent: Tuesday, November 13, 2012 6:48 PM
>> To: Dave, Tushar N
>> Cc: e1000-de...@lists.sf.net; net...@vger.kernel.org; linux-
>> ker...@vger.kernel.org; Mary Mcgrath
>> Subject: Re: 82571EB: Detected Hardware Unit Hang
>>
>> On 11/09/12 04:35, Dave, Tushar N wrote:
>>> All devices in path from root complex to 82571, should have *same* max
>> payload size otherwise it can cause hang.
>>> Can you double check this?
>>
>> Hi Tushar,
>>
>> Checked with hardware vendor and they said no way to modify the max
>> payload size from BIOS, can I modify it from driver side?
> 
> If you want to change value for 82571 device you can do it from eeprom but 
> for other upstream devices I am not sure. I will check with my team.

Hi Tushar,

Would you please help to fine the offset of max payload size in eeprom?
I'd like to have a try to modify it by ethtool.

Thanks in advance,
Joe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 82571EB: Detected Hardware Unit Hang

2012-11-14 Thread Joe Jin
On 11/14/12 11:45, Dave, Tushar N wrote:
 -Original Message-
 From: Joe Jin [mailto:joe@oracle.com]
 Sent: Tuesday, November 13, 2012 6:48 PM
 To: Dave, Tushar N
 Cc: e1000-de...@lists.sf.net; net...@vger.kernel.org; linux-
 ker...@vger.kernel.org; Mary Mcgrath
 Subject: Re: 82571EB: Detected Hardware Unit Hang

 On 11/09/12 04:35, Dave, Tushar N wrote:
 All devices in path from root complex to 82571, should have *same* max
 payload size otherwise it can cause hang.
 Can you double check this?

 Hi Tushar,

 Checked with hardware vendor and they said no way to modify the max
 payload size from BIOS, can I modify it from driver side?
 
 If you want to change value for 82571 device you can do it from eeprom but 
 for other upstream devices I am not sure. I will check with my team.

Hi Tushar,

Would you please help to fine the offset of max payload size in eeprom?
I'd like to have a try to modify it by ethtool.

Thanks in advance,
Joe
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 82571EB: Detected Hardware Unit Hang

2012-11-13 Thread Joe Jin
On 11/09/12 04:35, Dave, Tushar N wrote:
> All devices in path from root complex to 82571, should have *same* max 
> payload size otherwise it can cause hang. 
> Can you double check this?

Hi Tushar,

Checked with hardware vendor and they said no way to modify the max payload 
size 
from BIOS, can I modify it from driver side?

Thanks,
Joe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 82571EB: Detected Hardware Unit Hang

2012-11-13 Thread Joe Jin
On 11/09/12 04:35, Dave, Tushar N wrote:
 All devices in path from root complex to 82571, should have *same* max 
 payload size otherwise it can cause hang. 
 Can you double check this?

Hi Tushar,

Checked with hardware vendor and they said no way to modify the max payload 
size 
from BIOS, can I modify it from driver side?

Thanks,
Joe
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 82571EB: Detected Hardware Unit Hang

2012-11-08 Thread Joe Jin
On 11/09/12 04:35, Dave, Tushar N wrote:
> Are you sure this is not similar issue as before that you reported.
> i.e. 

Tushar,

Thanks for your quick response, I'll check with customer if they can modify the 
Max
payload size from BIOS, this time issue hit on HP's server.

Thanks again,
Joe

> On Mon, 2012-07-09 at 16:51 +0800, Joe Jin wrote:
>> > I'm seeing a Unit Hang even with the latest e1000e driver 2.0.0 when 
>> > doing scp test. this issue is easy do reproduced on SUN FIRE X2270 M2, 
>> > just copy a big file (>500M) from another server will hit it at once.
> All devices in path from root complex to 82571, should have *same* max 
> payload size otherwise it can cause hang. 
> Can you double check this?
> 


-- 
Oracle <http://www.oracle.com>
Joe Jin | Software Development Senior Manager | +8610.6106.5624
ORACLE | Linux and Virtualization
No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 82571EB: Detected Hardware Unit Hang

2012-11-08 Thread Joe Jin
On 11/09/12 04:35, Dave, Tushar N wrote:
 Are you sure this is not similar issue as before that you reported.
 i.e. 

Tushar,

Thanks for your quick response, I'll check with customer if they can modify the 
Max
payload size from BIOS, this time issue hit on HP's server.

Thanks again,
Joe

 On Mon, 2012-07-09 at 16:51 +0800, Joe Jin wrote:
  I'm seeing a Unit Hang even with the latest e1000e driver 2.0.0 when 
  doing scp test. this issue is easy do reproduced on SUN FIRE X2270 M2, 
  just copy a big file (500M) from another server will hit it at once.
 All devices in path from root complex to 82571, should have *same* max 
 payload size otherwise it can cause hang. 
 Can you double check this?
 


-- 
Oracle http://www.oracle.com
Joe Jin | Software Development Senior Manager | +8610.6106.5624
ORACLE | Linux and Virtualization
No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


82571EB: Detected Hardware Unit Hang

2012-11-07 Thread Joe Jin
Hi list,

IHAC reported "82571EB Detected Hardware Unit Hang" on HP ProLiant DL360 G6, and
have to reboot the server to recover:

e1000e :06:00.1: eth3: Detected Hardware Unit Hang:
  TDH  <1a>
  TDT  <1a>
  next_to_use  <1a>
  next_to_clean<18>
buffer_info[next_to_clean]:
  time_stamp   <10047a74e>
  next_to_watch<18>
  jiffies  <10047a88c>
  next_to_watch.status <1>
MAC Status <80383>
PHY Status <792d>
PHY 1000BASE-T Status  <3800>
PHY Extended Status<3000>
PCI Status <10>

With newer kernel 2.0.0.1 the issue still reproducible.

Device info:
06:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (Copper) (rev 06)
06:00.1 0200: 8086:10bc (rev 06)

I compared lspci output before and after the issue, different as below:
 06:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (Copper) (rev 06)
Subsystem: Hewlett-Packard Company NC364T PCI Express Quad Port Gigabit 
Server Adapter
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ 
Stepping- SERR- FastB2B- DisINTx-
-   Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- http://www.oracle.com>
Joe Jin | Software Development Senior Manager | +8610.6106.5624
ORACLE | Linux and Virtualization
No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


82571EB: Detected Hardware Unit Hang

2012-11-07 Thread Joe Jin
Hi list,

IHAC reported 82571EB Detected Hardware Unit Hang on HP ProLiant DL360 G6, and
have to reboot the server to recover:

e1000e :06:00.1: eth3: Detected Hardware Unit Hang:
  TDH  1a
  TDT  1a
  next_to_use  1a
  next_to_clean18
buffer_info[next_to_clean]:
  time_stamp   10047a74e
  next_to_watch18
  jiffies  10047a88c
  next_to_watch.status 1
MAC Status 80383
PHY Status 792d
PHY 1000BASE-T Status  3800
PHY Extended Status3000
PCI Status 10

With newer kernel 2.0.0.1 the issue still reproducible.

Device info:
06:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (Copper) (rev 06)
06:00.1 0200: 8086:10bc (rev 06)

I compared lspci output before and after the issue, different as below:
 06:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (Copper) (rev 06)
Subsystem: Hewlett-Packard Company NC364T PCI Express Quad Port Gigabit 
Server Adapter
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ 
Stepping- SERR- FastB2B- DisINTx-
-   Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- 
MAbort- SERR- PERR- INTx-
+   Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- 
MAbort- SERR- PERR- INTx+


Would you please help to it?

Thanks in advance,
Joe

-- 
Oracle http://www.oracle.com
Joe Jin | Software Development Senior Manager | +8610.6106.5624
ORACLE | Linux and Virtualization
No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] qla3xxx: Ensure request/response queue addr writes to the registers

2012-10-21 Thread Joe Jin
Before use the request and response queue addr, make sure it has wrote
to the registers.

Signed-off-by: Joe Jin 
Cc: Jitendra Kalsaria 
Cc: Ron Mercer 
---
 drivers/net/ethernet/qlogic/qla3xxx.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c 
b/drivers/net/ethernet/qlogic/qla3xxx.c
index df09b1c..6407d0d 100644
--- a/drivers/net/ethernet/qlogic/qla3xxx.c
+++ b/drivers/net/ethernet/qlogic/qla3xxx.c
@@ -2525,6 +2525,13 @@ static int ql_alloc_net_req_rsp_queues(struct 
ql3_adapter *qdev)
qdev->req_q_size =
(u32) (NUM_REQ_Q_ENTRIES * sizeof(struct ob_mac_iocb_req));
 
+   qdev->rsp_q_size = NUM_RSP_Q_ENTRIES * sizeof(struct net_rsp_iocb);
+
+   /* The barrier is required to ensure request and response queue
+* addr writes to the registers.
+*/
+   wmb();
+
qdev->req_q_virt_addr =
pci_alloc_consistent(qdev->pdev,
 (size_t) qdev->req_q_size,
@@ -2536,8 +2543,6 @@ static int ql_alloc_net_req_rsp_queues(struct ql3_adapter 
*qdev)
return -ENOMEM;
}
 
-   qdev->rsp_q_size = NUM_RSP_Q_ENTRIES * sizeof(struct net_rsp_iocb);
-
qdev->rsp_q_virt_addr =
pci_alloc_consistent(qdev->pdev,
 (size_t) qdev->rsp_q_size,
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] qla3xxx: Ensure request/response queue addr writes to the registers

2012-10-21 Thread Joe Jin
Before use the request and response queue addr, make sure it has wrote
to the registers.

Signed-off-by: Joe Jin joe@oracle.com
Cc: Jitendra Kalsaria jitendra.kalsa...@qlogic.com
Cc: Ron Mercer ron.mer...@qlogic.com
---
 drivers/net/ethernet/qlogic/qla3xxx.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c 
b/drivers/net/ethernet/qlogic/qla3xxx.c
index df09b1c..6407d0d 100644
--- a/drivers/net/ethernet/qlogic/qla3xxx.c
+++ b/drivers/net/ethernet/qlogic/qla3xxx.c
@@ -2525,6 +2525,13 @@ static int ql_alloc_net_req_rsp_queues(struct 
ql3_adapter *qdev)
qdev-req_q_size =
(u32) (NUM_REQ_Q_ENTRIES * sizeof(struct ob_mac_iocb_req));
 
+   qdev-rsp_q_size = NUM_RSP_Q_ENTRIES * sizeof(struct net_rsp_iocb);
+
+   /* The barrier is required to ensure request and response queue
+* addr writes to the registers.
+*/
+   wmb();
+
qdev-req_q_virt_addr =
pci_alloc_consistent(qdev-pdev,
 (size_t) qdev-req_q_size,
@@ -2536,8 +2543,6 @@ static int ql_alloc_net_req_rsp_queues(struct ql3_adapter 
*qdev)
return -ENOMEM;
}
 
-   qdev-rsp_q_size = NUM_RSP_Q_ENTRIES * sizeof(struct net_rsp_iocb);
-
qdev-rsp_q_virt_addr =
pci_alloc_consistent(qdev-pdev,
 (size_t) qdev-rsp_q_size,
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] qla3xxx: Ensure request/response queue addr writes to the registers

2012-10-18 Thread Joe Jin
Before use the request and response queue addr, make sure it has wrote
to the registers.

Signed-off-by: Joe Jin 
Cc: Jitendra Kalsaria 
Cc: Ron Mercer 
---
 drivers/net/ethernet/qlogic/qla3xxx.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c 
b/drivers/net/ethernet/qlogic/qla3xxx.c
index df09b1c..f745ade 100644
--- a/drivers/net/ethernet/qlogic/qla3xxx.c
+++ b/drivers/net/ethernet/qlogic/qla3xxx.c
@@ -2525,6 +2525,12 @@ static int ql_alloc_net_req_rsp_queues(struct 
ql3_adapter *qdev)
qdev->req_q_size =
(u32) (NUM_REQ_Q_ENTRIES * sizeof(struct ob_mac_iocb_req));
 
+   /*
+* The barrier is required to ensure request and response queue
+* addr writes to the registers.
+*/
+   wmb();
+
qdev->req_q_virt_addr =
pci_alloc_consistent(qdev->pdev,
 (size_t) qdev->req_q_size,
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] qla3xxx: Ensure request/response queue addr writes to the registers

2012-10-18 Thread Joe Jin
Before use the request and response queue addr, make sure it has wrote
to the registers.

Signed-off-by: Joe Jin 
Cc: Jitendra Kalsaria 
Cc: Ron Mercer 
---
 drivers/net/ethernet/qlogic/qla3xxx.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c 
b/drivers/net/ethernet/qlogic/qla3xxx.c
index df09b1c..f745ade 100644
--- a/drivers/net/ethernet/qlogic/qla3xxx.c
+++ b/drivers/net/ethernet/qlogic/qla3xxx.c
@@ -2525,6 +2525,12 @@ static int ql_alloc_net_req_rsp_queues(struct 
ql3_adapter *qdev)
qdev->req_q_size =
(u32) (NUM_REQ_Q_ENTRIES * sizeof(struct ob_mac_iocb_req));
 
+   /*
+* The barrier is required to ensure request and response queue
+* addr writes to the registers.
+*/
+   wmb();
+
qdev->req_q_virt_addr =
pci_alloc_consistent(qdev->pdev,
 (size_t) qdev->req_q_size,
-- 
1.7.11.7



[PATCH] qla3xxx: Ensure request/response queue addr writes to the registers

2012-10-18 Thread Joe Jin
Before use the request and response queue addr, make sure it has wrote
to the registers.

Signed-off-by: Joe Jin joe@oracle.com
Cc: Jitendra Kalsaria jitendra.kalsa...@qlogic.com
Cc: Ron Mercer ron.mer...@qlogic.com
---
 drivers/net/ethernet/qlogic/qla3xxx.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c 
b/drivers/net/ethernet/qlogic/qla3xxx.c
index df09b1c..f745ade 100644
--- a/drivers/net/ethernet/qlogic/qla3xxx.c
+++ b/drivers/net/ethernet/qlogic/qla3xxx.c
@@ -2525,6 +2525,12 @@ static int ql_alloc_net_req_rsp_queues(struct 
ql3_adapter *qdev)
qdev-req_q_size =
(u32) (NUM_REQ_Q_ENTRIES * sizeof(struct ob_mac_iocb_req));
 
+   /*
+* The barrier is required to ensure request and response queue
+* addr writes to the registers.
+*/
+   wmb();
+
qdev-req_q_virt_addr =
pci_alloc_consistent(qdev-pdev,
 (size_t) qdev-req_q_size,
-- 
1.7.11.7



[PATCH] qla3xxx: Ensure request/response queue addr writes to the registers

2012-10-18 Thread Joe Jin
Before use the request and response queue addr, make sure it has wrote
to the registers.

Signed-off-by: Joe Jin joe@oracle.com
Cc: Jitendra Kalsaria jitendra.kalsa...@qlogic.com
Cc: Ron Mercer ron.mer...@qlogic.com
---
 drivers/net/ethernet/qlogic/qla3xxx.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c 
b/drivers/net/ethernet/qlogic/qla3xxx.c
index df09b1c..f745ade 100644
--- a/drivers/net/ethernet/qlogic/qla3xxx.c
+++ b/drivers/net/ethernet/qlogic/qla3xxx.c
@@ -2525,6 +2525,12 @@ static int ql_alloc_net_req_rsp_queues(struct 
ql3_adapter *qdev)
qdev-req_q_size =
(u32) (NUM_REQ_Q_ENTRIES * sizeof(struct ob_mac_iocb_req));
 
+   /*
+* The barrier is required to ensure request and response queue
+* addr writes to the registers.
+*/
+   wmb();
+
qdev-req_q_virt_addr =
pci_alloc_consistent(qdev-pdev,
 (size_t) qdev-req_q_size,
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] qla3xxx: Ensure req_q_phy_addr writes to the register

2012-10-17 Thread Joe Jin
On 10/18/12 01:45, Jitendra Kalsaria wrote:
> 
> 
>> -Original Message-----
>> From: Joe Jin [mailto:joe@oracle.com] 
>> Sent: Tuesday, October 16, 2012 11:32 PM
>> To: Ron Mercer; Jitendra Kalsaria; Dept-Eng Linux Driver
>> Cc: netdev; linux-kernel; Greg Marsden
>> Subject: [PATCH] qla3xxx: Ensure req_q_phy_addr writes to the register
>>
>> Make sure req_q_phy_addr write to the register.
>>
>> Signed-off-by: Joe Jin 
>> Cc: Ron Mercer 
>> Cc: Jitendra Kalsaria 
>> ---
>> drivers/net/ethernet/qlogic/qla3xxx.c | 6 ++
>> 1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c 
>> b/drivers/net/ethernet/qlogic/qla3xxx.c
>> index df09b1c..78b4cba 100644
>> --- a/drivers/net/ethernet/qlogic/qla3xxx.c
>> +++ b/drivers/net/ethernet/qlogic/qla3xxx.c
>> @@ -2525,6 +2525,12 @@ static int ql_alloc_net_req_rsp_queues(struct 
>> ql3_adapter *qdev)
>>  qdev->req_q_size =
>>  (u32) (NUM_REQ_Q_ENTRIES * sizeof(struct ob_mac_iocb_req));
>>
>> +/*
>> + * The barrier is required to ensure that req_q_phy_addr writes to
>> + * the memory.
>> + */
>> +wmb();
>> +
>>  qdev->req_q_virt_addr =
>>  pci_alloc_consistent(qdev->pdev,
>>   (size_t) qdev->req_q_size,
> 
> Your changes only take care of request queue but not response queue which 
> also need barrier.

Jiten,

Thanks for review!
The barrier to make sure writel() call for req_q_phy_addr and rsp_q_phy_addr in 
ql_adapter_initialize(), so I think call once wmb() is enough but I need to 
update
the comment, any idea?

Thanks,
Joe

> 
>   qdev->req_q_size =
>   (u32) (NUM_REQ_Q_ENTRIES * sizeof(struct ob_mac_iocb_req));
> 
>   qdev->rsp_q_size = NUM_RSP_Q_ENTRIES * sizeof(struct net_rsp_iocb);
> 
>   wmb();
> 
> thanks,
>   Jiten
> 


-- 
Oracle <http://www.oracle.com>
Joe Jin | Software Development Senior Manager | +8610.6106.5624
ORACLE | Linux and Virtualization
No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] qla3xxx: Ensure req_q_phy_addr writes to the register

2012-10-17 Thread Joe Jin
Make sure req_q_phy_addr write to the register.

Signed-off-by: Joe Jin 
Cc: Ron Mercer 
Cc: Jitendra Kalsaria 
---
 drivers/net/ethernet/qlogic/qla3xxx.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c 
b/drivers/net/ethernet/qlogic/qla3xxx.c
index df09b1c..78b4cba 100644
--- a/drivers/net/ethernet/qlogic/qla3xxx.c
+++ b/drivers/net/ethernet/qlogic/qla3xxx.c
@@ -2525,6 +2525,12 @@ static int ql_alloc_net_req_rsp_queues(struct 
ql3_adapter *qdev)
qdev->req_q_size =
(u32) (NUM_REQ_Q_ENTRIES * sizeof(struct ob_mac_iocb_req));
 
+   /*
+* The barrier is required to ensure that req_q_phy_addr writes to
+* the memory.
+*/
+   wmb();
+
qdev->req_q_virt_addr =
pci_alloc_consistent(qdev->pdev,
 (size_t) qdev->req_q_size,
-- 
1.7.11.7



[PATCH] qla3xxx: Ensure req_q_phy_addr writes to the register

2012-10-17 Thread Joe Jin
Make sure req_q_phy_addr write to the register.

Signed-off-by: Joe Jin joe@oracle.com
Cc: Ron Mercer ron.mer...@qlogic.com
Cc: Jitendra Kalsaria jitendra.kalsa...@qlogic.com
---
 drivers/net/ethernet/qlogic/qla3xxx.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c 
b/drivers/net/ethernet/qlogic/qla3xxx.c
index df09b1c..78b4cba 100644
--- a/drivers/net/ethernet/qlogic/qla3xxx.c
+++ b/drivers/net/ethernet/qlogic/qla3xxx.c
@@ -2525,6 +2525,12 @@ static int ql_alloc_net_req_rsp_queues(struct 
ql3_adapter *qdev)
qdev-req_q_size =
(u32) (NUM_REQ_Q_ENTRIES * sizeof(struct ob_mac_iocb_req));
 
+   /*
+* The barrier is required to ensure that req_q_phy_addr writes to
+* the memory.
+*/
+   wmb();
+
qdev-req_q_virt_addr =
pci_alloc_consistent(qdev-pdev,
 (size_t) qdev-req_q_size,
-- 
1.7.11.7



Re: [PATCH] qla3xxx: Ensure req_q_phy_addr writes to the register

2012-10-17 Thread Joe Jin
On 10/18/12 01:45, Jitendra Kalsaria wrote:
 
 
 -Original Message-
 From: Joe Jin [mailto:joe@oracle.com] 
 Sent: Tuesday, October 16, 2012 11:32 PM
 To: Ron Mercer; Jitendra Kalsaria; Dept-Eng Linux Driver
 Cc: netdev; linux-kernel; Greg Marsden
 Subject: [PATCH] qla3xxx: Ensure req_q_phy_addr writes to the register

 Make sure req_q_phy_addr write to the register.

 Signed-off-by: Joe Jin joe@oracle.com
 Cc: Ron Mercer ron.mer...@qlogic.com
 Cc: Jitendra Kalsaria jitendra.kalsa...@qlogic.com
 ---
 drivers/net/ethernet/qlogic/qla3xxx.c | 6 ++
 1 file changed, 6 insertions(+)

 diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c 
 b/drivers/net/ethernet/qlogic/qla3xxx.c
 index df09b1c..78b4cba 100644
 --- a/drivers/net/ethernet/qlogic/qla3xxx.c
 +++ b/drivers/net/ethernet/qlogic/qla3xxx.c
 @@ -2525,6 +2525,12 @@ static int ql_alloc_net_req_rsp_queues(struct 
 ql3_adapter *qdev)
  qdev-req_q_size =
  (u32) (NUM_REQ_Q_ENTRIES * sizeof(struct ob_mac_iocb_req));

 +/*
 + * The barrier is required to ensure that req_q_phy_addr writes to
 + * the memory.
 + */
 +wmb();
 +
  qdev-req_q_virt_addr =
  pci_alloc_consistent(qdev-pdev,
   (size_t) qdev-req_q_size,
 
 Your changes only take care of request queue but not response queue which 
 also need barrier.

Jiten,

Thanks for review!
The barrier to make sure writel() call for req_q_phy_addr and rsp_q_phy_addr in 
ql_adapter_initialize(), so I think call once wmb() is enough but I need to 
update
the comment, any idea?

Thanks,
Joe

 
   qdev-req_q_size =
   (u32) (NUM_REQ_Q_ENTRIES * sizeof(struct ob_mac_iocb_req));
 
   qdev-rsp_q_size = NUM_RSP_Q_ENTRIES * sizeof(struct net_rsp_iocb);
 
   wmb();
 
 thanks,
   Jiten
 


-- 
Oracle http://www.oracle.com
Joe Jin | Software Development Senior Manager | +8610.6106.5624
ORACLE | Linux and Virtualization
No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 82571EB: Detected Hardware Unit Hang

2012-07-14 Thread Joe Jin
On 07/15/12 11:42, Dave, Tushar N wrote:
>> -Original Message-
>> From: Joe Jin [mailto:joe@oracle.com]
>> Sent: Thursday, July 12, 2012 9:34 PM
>> To: Dave, Tushar N
>> Cc: e1000-de...@lists.sf.net; net...@vger.kernel.org; linux-
>> ker...@vger.kernel.org
>> Subject: Re: 82571EB: Detected Hardware Unit Hang
>>
>> On 07/13/12 12:10, Dave, Tushar N wrote:
>>>> -Original Message-
>>>> From: Joe Jin [mailto:joe@oracle.com]
>>>> Sent: Thursday, July 12, 2012 4:46 PM
>>>> To: Dave, Tushar N
>>>> Cc: e1000-de...@lists.sf.net; net...@vger.kernel.org; linux-
>>>> ker...@vger.kernel.org
>>>> Subject: Re: 82571EB: Detected Hardware Unit Hang
>>>>
>>> Thanks for sending full dmesg log. I am still investigating. I think
>> this issue can occur if two PCIe link partner *i.e pcie bridge and pcie
>> device do not have same max payload size.
>>> I need 2 more info.
>>> 1) PBA number of the card.
>>
>> This is a remote server and I could not get this.
>>
>>> 2) full lspci -vvv output of entire system 'after you have changed max
>> payload size to 128'.
> 
> Somehow setting max payload to 256 from BIOS does not set this value for all 
> devices. I believe this is a BIOS bug.
> All devices in path from root complex to 82571, should have same max payload 
> size otherwise it can cause hang. When you set max payload to 128 from BIOS, 
> all device in path from root complex to 82571 got assigned same max payload 
> size. This resolves the issue.
> 
> I hope this helps.

Tushar,

Thanks a lot for your help, will send this to hardware engineer.

Regards,
Joe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 82571EB: Detected Hardware Unit Hang

2012-07-14 Thread Joe Jin
On 07/15/12 11:42, Dave, Tushar N wrote:
 -Original Message-
 From: Joe Jin [mailto:joe@oracle.com]
 Sent: Thursday, July 12, 2012 9:34 PM
 To: Dave, Tushar N
 Cc: e1000-de...@lists.sf.net; net...@vger.kernel.org; linux-
 ker...@vger.kernel.org
 Subject: Re: 82571EB: Detected Hardware Unit Hang

 On 07/13/12 12:10, Dave, Tushar N wrote:
 -Original Message-
 From: Joe Jin [mailto:joe@oracle.com]
 Sent: Thursday, July 12, 2012 4:46 PM
 To: Dave, Tushar N
 Cc: e1000-de...@lists.sf.net; net...@vger.kernel.org; linux-
 ker...@vger.kernel.org
 Subject: Re: 82571EB: Detected Hardware Unit Hang

 Thanks for sending full dmesg log. I am still investigating. I think
 this issue can occur if two PCIe link partner *i.e pcie bridge and pcie
 device do not have same max payload size.
 I need 2 more info.
 1) PBA number of the card.

 This is a remote server and I could not get this.

 2) full lspci -vvv output of entire system 'after you have changed max
 payload size to 128'.
 
 Somehow setting max payload to 256 from BIOS does not set this value for all 
 devices. I believe this is a BIOS bug.
 All devices in path from root complex to 82571, should have same max payload 
 size otherwise it can cause hang. When you set max payload to 128 from BIOS, 
 all device in path from root complex to 82571 got assigned same max payload 
 size. This resolves the issue.
 
 I hope this helps.

Tushar,

Thanks a lot for your help, will send this to hardware engineer.

Regards,
Joe

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 82571EB: Detected Hardware Unit Hang

2012-07-12 Thread Joe Jin
On 07/12/12 13:57, Dave, Tushar N wrote:
>> -Original Message-
>> From: Joe Jin [mailto:joe@oracle.com]
>> Sent: Wednesday, July 11, 2012 8:13 PM
>> To: Dave, Tushar N
>> Cc: e1000-de...@lists.sf.net; net...@vger.kernel.org; linux-
>> ker...@vger.kernel.org
>> Subject: Re: 82571EB: Detected Hardware Unit Hang
>>
>> On 07/12/12 11:07, Dave, Tushar N wrote:
>>>> -Original Message-
>>>> From: Joe Jin [mailto:joe@oracle.com]
>>>> Sent: Wednesday, July 11, 2012 7:58 PM
>>>> To: Dave, Tushar N
>>>> Cc: e1000-de...@lists.sf.net; net...@vger.kernel.org; linux-
>>>> ker...@vger.kernel.org
>>>> Subject: Re: 82571EB: Detected Hardware Unit Hang
>>>>
>>>> On 07/12/12 10:52, Dave, Tushar N wrote:
>>>>> What is the exact error messages in BIOS log?
>>>>
>>>> Error message from BIOS event log:
>>>> 07/12/12 05:54:00
>>>>PCI Express Non-Fatal Error
>>>>
>>>> Thanks,
>>>> Joe
>> Hi Tushar,
>>
>> Please find eeprom from attachment.
> 
> Do you have lspci -vvv dump of entire system before and after issue occurs? 
> If you have can you send it to me?
> 

Before:
05:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (Copper) (rev 06)
Subsystem: Oracle Corporation x4 PCI-Express Quad Gigabit Ethernet UTP 
Low Profile Adapter
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >