Yes, it does finish the Trove Migration and print similar messages. The file system responds to requests; I just can't create files larger than one strip size. Once I restart the file system I can, but on first start, they fail.
Bart. On Thu, Apr 29, 2010 at 1:50 PM, Kevin Harms <[email protected]> wrote: > Bart, > > I think the server should print out when conversion starts and ends. > > examples: > Trove Migration Started: Ver=2.6.3 > Trove Migration Complete: Ver=2.6.3 > Trove Migration Set: 2.8.1 > > Does is get that far? > > kevin > > On Apr 29, 2010, at 1:55 PM, Bart Taylor wrote: > > > Thanks for the information and suggestion Phil. Unfortunately, I didn't > get a different result after moving that BMI init block. I also managed to > reproduce this once while leaving the trove method to alt-aio although that > doesn't seem directly related to the direction you were going. > > > > Another thing I noticed is that I can create files successfully after the > upgrade as long as the size is within 64k which is the value of my > strip_size distribution param. Once the size exceeds that value, I start > running into this problem again. > > > > Does that help shed any more light on my situation? > > > > Bart. > > > > > > On Fri, Apr 16, 2010 at 1:39 PM, Phil Carns <[email protected]> wrote: > > Sadly none of my test boxes will run 2.6 any more, but I have a theory > about what the problem might be here. > > > > For some background, the pvfs2-server daemon does these steps in order > (among others): initializes BMI (networking), initializes Trove (storage), > and then finally starts processing requests. > > > > In your case, two extra things are going on: > > > > - the trove initialization may take a while, because it has to do a > conversion of the > > format for all objects from v. 2.6 to 2.8, especially if it is also > switching to o_direct format at the same time. > > > > - whichever server gets done first is going to immediately contact the > other servers in order to precreate handles for new files (a new feature in > 2.8) > > > > I'm guessing that one server finished the trove conversion before the > others and started its pre-create requests. The other servers can't answer > yet (because they are still busy with trove), but since BMI is already > running the incoming precreate requests just get queued up on the socket. > When the slow server finally does try to service them, the requests are way > out of date and have since been retried by the fast server. > > > > I'm not sure exactly what goes wrong from there, but if that's the cause, > the solution might be relatively simple. If you look in pvfs2-server.c, you > can take the block of code from "BMI_initialize(...)" to > "*server_status_flag |= SERVER_BMI_INIT;" and try moving that whole block to > _after_ the "*server_status_flag |= SERVER_TROVE_INIT;" line that indicates > that trove is done. > > > > -Phil > > > > > > On 03/30/2010 06:23 PM, Bart Taylor wrote: > >> > >> I am having some problems upgrading existing file systems to 2.8. After > I finish the upgrade and start the file system, I cannot create files. > Simple commands like dd and cp stall until they timeout and leave partial > dirents like this: > >> > >> [bat...@client t]$ dd if=/dev/zero of=/mnt/pvfs28/10MB.dat.6 bs=1M > count=10 > >> dd: writing `/mnt/pvfs28/10MB.dat.6': Connection timed out > >> 1+0 records in > >> 0+0 records out > >> 0 bytes (0 B) copied, 180.839 seconds, 0.0 kB/s > >> > >> > >> [r...@client ~]# ls -alh /mnt/pvfs28/ > >> total 31M > >> drwxrwxrwt 1 root root 4.0K Mar 30 11:24 . > >> drwxr-xr-x 4 root root 4.0K Mar 23 13:38 .. > >> -rw-rw-r-- 1 batayl batayl 10M Mar 30 08:44 10MB.dat.1 > >> -rw-rw-r-- 1 batayl batayl 10M Mar 30 08:44 10MB.dat.2 > >> -rw-rw-r-- 1 batayl batayl 10M Mar 30 08:44 10MB.dat.3 > >> ?--------- ? ? ? ? ? 10MB.dat.5 > >> drwxrwxrwx 1 root root 4.0K Mar 29 14:06 lost+found > >> > >> > >> This happens both on local disk and on network storage, but it only > happens if the upgraded file system starts up the first time using directio. > If it is started with alt-aio as the TroveMethod, everything works as > expected. It also only happens the first time the file system is started; if > I stop the server daemons and restart them, everything operates as expected. > I do have to kill -9 the server deamons, since they will not exit > gracefully. > >> > >> My test is running on RHEL4 U8 i386 with kernel version 2.6.9-89.ELsmp > with two server nodes and one client. I was unable to recreate the problem > with a single server. > >> > >> I attached verbose server logs from the time the daemon was started > after the upgrade until the client failed as well as client logs from mount > until the returned error. The cliffs notes are that one of the servers logs > as many unstuff requests as we have client retries configured. The client > fails at the end of the allotted retries. The other server doesn't log > anythign after starting. > >> > >> Has anyone seen anything similar or know what might be going on? > >> > >> Bart. > >> > >> > >> > >> > >> _______________________________________________ > >> Pvfs2-developers mailing list > >> > >> [email protected] > >> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers > >> > >> > >> > > > > > > _______________________________________________ > > Pvfs2-developers mailing list > > [email protected] > > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers > >
_______________________________________________ Pvfs2-developers mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
