Hi Bill,
This is really bad..I wish I had a system to repro your setup..
Is there something special in your kernel .config?
(PREEMPT for example)
What distro is this on btw?
thanks,
Murali


On 8/24/07, Bill Wichser <[EMAIL PROTECTED]> wrote:
> We have been experiencing frequent crashes in the PVFS2 kernel module
> when applications use standard system I/O to write to PVFS2 files. We
> are running the Linux 2.6.9-55.0.2 smp kernel, and PVFS2 v2.6.3.
> The general protection fault almost always occurs at
> pvfs2_devreq_writev+351.
>
> In our build, the invalid reference specifically occurs in the
> qhash_del() operation, within the inline qhash_search_and_remove()
> function called by pvfs2_devreq_writev(). See excerpts below:
>
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> static ssize_t pvfs2_devreq_writev(
>      struct file *file,
>      const struct iovec *iov,
>      unsigned long count,
>      loff_t * offset)
> {
> .
> .
> .
>      /* lookup (and remove) the op based on the tag */
>      hash_link = qhash_search_and_remove(htable_ops_in_progress, &(tag));
>      if (hash_link)
>      {
> .
> .
> .
> }
>
> /* qhash_search_and_remove()
>   *
>   * searches for and removes a link in the hash table
>   * that matches the given key
>   *
>   * returns pointer to link on success, NULL on failure (or item
>   * not found).  On success, link is removed from hashtable.
>   */
> static inline struct qhash_head *qhash_search_and_remove(
>      struct qhash_table *table,
>      void *key)
> {
>      int index = 0;
>      struct qhash_head *tmp_link = NULL;
>
>      /* find the hash value */    index = table->hash(key,
> table->table_size);
>
>      /* linear search at index to find match */
>      qhash_lock(&table->lock);
>      qhash_for_each(tmp_link, &(table->array[index]))
>      {
>          if (table->compare(key, tmp_link))
>          {
>              qhash_del(tmp_link);
>              qhash_unlock(&table->lock);
>              return (tmp_link);
>          }
>      }
>      qhash_unlock(&table->lock);
>      return (NULL);
> }
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> We have since run pvfs2-fsck on the file system and have found some
> corruption. So we're not sure if what we're seeing is just a
> second-order effect of the corruption, or is the actual cause of the
> corruption.
>
> So we're passing this along to you to see if you've had any similar
> reports, or can point us in the right direction to help find the
> problem.
>
> The crash file sys and bt info follows. Please let us know if you need
> more information.
>
> Thanks,
> Bill
>
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> crash> sys
>    SYSTEM MAP: /boot/System.map-2.6.9-55.0.2.ELsmp
> DEBUG KERNEL: /home/jsbillin/vmlinux-2.6.9-55.ELsmp (2.6.9-55.ELsmp)
>      DUMPFILE: /var/crash/172.18.0.85-2007-08-06-07:47/vmcore
>          CPUS: 4
>          DATE: Fri Aug 17 11:53:17 2007
>        UPTIME: 11 days, 04:08:05
> LOAD AVERAGE: 2.49, 2.10, 1.63
>         TASKS: 96
>      NODENAME: woodhen-085
>       RELEASE: 2.6.9-55.0.2.ELsmp
>       VERSION: #1 SMP Mon Jun 25 14:12:33 EDT 2007
>       MACHINE: x86_64  (2660 Mhz)
>        MEMORY: 9 GB
>         PANIC: ""
> crash> bt
> PID: 3454   TASK: 10236b63030       CPU: 0   COMMAND: "pvfs2-client-co"
>   #0 [10232fbbc60] netpoll_start_netdump at ffffffffa0249366
>   #1 [10232fbbc90] die at ffffffff80111c00
>   #2 [10232fbbcb0] do_general_protection at ffffffff801124e5
>   #3 [10232fbbcf0] error_exit at ffffffff80110d91
>      [exception RIP: pvfs2_devreq_writev+351]
>      RIP: ffffffffa0226948  RSP: 0000010232fbbda8  RFLAGS: 00010246
>      RAX: 0000000000000000  RBX: 40903a138d84f800  RCX: 0000000000000000
>      RDX: 40903a138d84f800  RSI: 00000101aeab1bd8  RDI: 0000010232fbbdc0
>      RBP: 0000010006bccd40   R8: 0000000000000000   R9: 0000000000000000
>      R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
>      R13: 000001020557e600  R14: 000001020557e5f0  R15: 0000010232fbbe88
>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>   #4 [10232fbbda0] pvfs2_devreq_writev at ffffffffa022693e
>   #5 [10232fbbe00] sock_readv_writev at ffffffff802a91f9
>   #6 [10232fbbe60] do_readv_writev at ffffffff8017a45f
>   #7 [10232fbbf40] sys_writev at ffffffff8017a631
>   #8 [10232fbbf80] system_call at ffffffff8011026a
>      RIP: 00000035854bfcdb  RSP: 0000007fbffff228  RFLAGS: 00010202
>      RAX: 0000000000000014  RBX: ffffffff8011026a  RCX: 00000035854bf1e9
>      RDX: 0000000000000004  RSI: 0000007fbffff120  RDI: 0000000000000005
>      RBP: 0000000000000000   R8: 0000000000000001   R9: 0000000000000004
>      R10: 0000000000000001  R11: 0000000000000206  R12: 0000000000000005
>      R13: 0000007fbffff120  R14: 0000000000000004  R15: 0000000000000000
>      ORIG_RAX: 0000000000000014  CS: 0033  SS: 002b
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> _______________________________________________
> Pvfs2-users mailing list
> [email protected]
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to