We have recently been experiencing performance problems within our cell such that a use can wait several minutes to open or update a file. Checks (e.g. uptime) on the server (we only have one; we know that it is old and slow, and its connection speed is only 100 Mbs) show no obvious problems, but in the AFS FileLog we have many messages of the form:
Thu Jan 31 08:53:59 2008 CB: Call back connect back failed (in break delayed) for 83eafe37.7001 Thu Jan 31 08:53:59 2008 BreakDelayedCallbacks FAILED for host 83eafe37 which IS UP. Possible network or routing failure. Thu Jan 31 08:53:59 2008 MultiProbe failed to find new address for host83eafe37.7001 Having extracted the records for today (31/1/2008) up to 8:53, I find there are 8897 records. Looking at the hostnames and converting them to IP addresses and names, I get the following list: 83ea6a26 = 131.234.106.38 = barrow.math.uni-paderborn.de 83ea6ab6 = 131.234.106.182 = reynolds.math.uni-paderborn.de 83ea6ac6 = 131.234.106.198 = fitting.math.uni-paderborn.de 83ea6c23 = 131.234.108.35 = pizza.math.uni-paderborn.de 83ea70cb = 131.234.112.203 = yang.ifim.uni-paderborn.de 83eafe33 = 131.234.254.51 = albert.et.uni-paderborn.de 83eafe36 = 131.234.254.54 = ysabell.et.uni-paderborn.de 83eafe37 = 131.234.254.55 = ridcully.et.uni-paderborn.de 83eafe38 = 131.234.254.56 = esme.et.uni-paderborn.de 83eafe39 = 131.234.254.57 = gytha.et.uni-paderborn.de 83eafe3a = 131.234.254.58 = magrat.et.uni-paderborn.de 83eafe3b = 131.234.254.59 = teppic.et.uni-paderborn.de 83eafe3d = 131.234.254.61 = schelter.et.uni-paderborn.de 83eafe40 = 131.234.254.64 = detritus.et.uni-paderborn.de 83eafe41 = 131.234.254.65 = colon.et.uni-paderborn.de 83eafe42 = 131.234.254.66 = nobbs.et.uni-paderborn.de 83eafe43 = 131.234.254.67 = poons.et.uni-paderborn.de 83eafe45 = 131.234.254.69 = quirm.et.uni-paderborn.de 83eafe48 = 131.234.254.72 = champot.et.uni-paderborn.de Each of these addresses is reference more than 100 times, 3 of them more than 1000 times (just today). Is anyone from this cell on the list ? If so, please would they contact me to see if we can resolve this problem ? In the past it has been traced to either a site router problem or a firewall that has closed access to port 7001. Any comments from other people on the list would also be helpful. Jonathan Wheeler e-Science Centre Rutherford Appleton Laboratory _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
