On Thu, Apr 16, 2009 at 11:11 AM, Rainer Toebbicke <[email protected]> wrote:
> The attached patch causes a transient failure to create a volume transaction
> to be retried, brutally three times in 1 sec intervals.
>
> The problem usually only affects servers with ten-thousands of volumes,
> where a simple "vos listvol" could easily disturb a simultaneous "vos
> backupsys", or one out of two simultaneous "vos listvols" could print
> thousands of error messages depending on how they race.
>
> Note: The patch "undoes" another patch (and its correction) in that area -
> that's not elegant but ok as it predates that patch and fixes both problems
> in one go, for the first one in a slightly different manner.
>

I don't think this patch is the right approach to the problem.  Server
threads should be treated as a precious resource.  Holding calls
active for several seconds at a time to circumvent the fact that
vos/libadmin lack sufficient retry logic is a suboptimal solution
which will reduce volop parallelism in large production environments.
Granted, having vos clients poll every few seconds is also suboptimal
from a fair queuing perspective, but there are other, better solutions
to that problem which don't unnecessarily tie up server threads.

Cheers,

-Tom



> bcc'ed to openafs-bugs
>
> --
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Rainer Toebbicke
> European Laboratory for Particle Physics(CERN) - Geneva, Switzerland
> Phone: +41 22 767 8985       Fax: +41 22 767 7155
>
> Index: openafs/src/volser/voltrans.c
> ===================================================================
> RCS file: /cvs/openafs/src/volser/voltrans.c,v
> retrieving revision 1.10.2.5
> diff -u -r1.10.2.5 voltrans.c
> --- openafs/src/volser/voltrans.c       18 Oct 2008 19:26:17 -0000
>  1.10.2.5
> +++ openafs/src/volser/voltrans.c       16 Apr 2009 14:21:37 -0000
> @@ -75,21 +75,31 @@
>  NewTrans(afs_int32 avol, afs_int32 apart)
>  {
>     /* set volid, next, partition */
> -    struct volser_trans *tt, *newtt;
> +    struct volser_trans *tt;
>     struct timeval tp;
>     struct timezone tzp;
> +    int retries = 3;
>
> -    newtt = (struct volser_trans *)malloc(sizeof(struct volser_trans));
>     VTRANS_LOCK;
>     /* don't allow the same volume to be attached twice */
> -    for (tt = allTrans; tt; tt = tt->next) {
> -       if ((tt->volid == avol) && (tt->partition == apart)) {
> -           VTRANS_UNLOCK;
> -           free(newtt);
> -           return (struct volser_trans *)0;    /* volume busy */
> +    while (1) {
> +       for (tt = allTrans; tt; tt = tt->next) {
> +           if ((tt->volid == avol) && (tt->partition == apart)) break;
>        }
> +       if (tt == NULL) break;
> +
> +       VTRANS_UNLOCK;
> +       if (retries-- > 0)
> +#ifdef AFS_PTHREAD_ENV
> +           sleep(1);           /* Allow for short lock-ups */
> +#else
> +           IOMGR_Sleep(1);     /* Allow for short lock-ups */
> +#endif /*AFS_PTHREAD_ENV*/
> +       else
> +           return (struct volser_trans *)0;    /* volume busy */
> +       VTRANS_LOCK;
>     }
> -    tt = newtt;
> +    tt = (struct volser_trans *)malloc(sizeof(struct volser_trans));
>     memset(tt, 0, sizeof(struct volser_trans));
>     tt->volid = avol;
>     tt->partition = apart;
>
>



-- 
Tom Keiser
[email protected]
_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to