Juan: We have also been able to recreate your problem with startup and creating the root directory information. We are working now to put a fix in place.
Becky On Fri, May 22, 2015 at 7:20 PM, Boyd Wilson <[email protected]> wrote: > default mode still uses PKI to some degree, but all of the expensive > signing operations are done without or minimal keys, but the time drift may > affect it (possibly, I will have to check with the developers that are > more familiar with that code). > > -b > > On Fri, May 22, 2015 at 7:09 PM Juan PC <[email protected]> wrote: > >> Good to know :-). However, I use the default mode security (the old one, >> I think). >> >> Regards, >> >> Juan >> >> El 23/05/15 a las 00:52, Boyd Wilson escribió: >> > The new capability based security uses pki so it is time dependent, so >> > time drift could cause problems. As far as I can tell we have not >> > documented this, so we need to do so. >> > >> > -b >> > >> > On Fri, May 22, 2015 at 6:49 PM Juan PC <[email protected] >> > <mailto:[email protected]>> wrote: >> > >> > Hi Becky, >> > >> > When I have tried to set up an OrangeFS cluster with 4 and 8 nodes, >> the >> > batch_create error message has appeared again. Then, I have realized >> > that some of my nodes had a wrong time (with a maximum difference >> of two >> > hours and a half between nodes). After synchronizing the times, the >> > batch_create problem seems to be gone. Does this make sense? I >> mean, can >> > a wrong time in some servers cause the problem? I do not remember >> seeing >> > any recommendation or warning about node times in the OrangeFS >> > documentation? >> > >> > Regards, >> > >> > Juan >> > >> > El 16/05/15 a las 22:59, Becky Ligon escribió: >> > > Juan: >> > > >> > > The conf file looks good. Can you send me your server log files? >> > > >> > > Becky >> > > >> > > On Saturday, May 16, 2015, Juan PC <[email protected] >> > <mailto:[email protected]> >> > > <mailto:[email protected] <mailto:[email protected]>>> wrote: >> > > >> > > It is attached. >> > > >> > > I do not know if this is important, but one thing that I have >> > seen with >> > > this configuration file is that if I run the second server >> > just after >> > > running the first server, everything seems to work. However, >> > if I wait >> > > for a few seconds, the error message of the root directory >> > appears in >> > > the first server. Then, when I launch de second server, I get >> the >> > > avalanche of batch_create error messages. This avalanche seems >> > to stop >> > > when it has generated around 1 GB of data. However, because >> of the >> > > problem with the root directory, the file system does not >> work. >> > > >> > > I have checked if waiting for a few seconds between server >> > executions is >> > > an issue in OrangeFS 2.8.7 and it is not. >> > > >> > > Regards, >> > > >> > > Juan >> > > >> > > El 16/05/15 a las 17:59, Becky Ligon escribió: >> > > > Can you send me your orangefs-server.conf file? >> > > > >> > > > NOTE: do not use native IB with this version. we have a >> > known issue >> > > > with distributed directories and IB that we are currently >> > working on. >> > > > >> > > > Becky >> > > > >> > > > On Sat, May 16, 2015 at 11:43 AM, <[email protected] >> > <mailto:[email protected]> <javascript:;> >> > > > <mailto:[email protected] <mailto:[email protected]> >> > <javascript:;>>> wrote: >> > > > >> > > > No, only TCP over Ethernet. We have IB NICs, but I have >> not >> > > compiled >> > > > OrangeFS with support for them. >> > > > >> > > > Juan >> > > > >> > > > >> > > > Quoting "Becky Ligon" <[email protected] >> > <mailto:[email protected]> <javascript:;> >> > > > <mailto:[email protected] <mailto:[email protected] >> > >> > <javascript:;>>>: >> > > > >> > > > Are you using native IB? >> > > > >> > > > Becky >> > > > >> > > > Sent from my iPhone >> > > > >> > > > On May 15, 2015, at 5:39 PM, Juan PC >> > > <[email protected] <mailto:[email protected]> >> <javascript:;> >> > > > <mailto:[email protected] >> > <mailto:[email protected]> <javascript:;>>> wrote: >> > > > >> > > > Hi, >> > > > >> > > > Well, your configuration can probably avoid the >> > > problem with the >> > > > benchmark, which I can not run because the >> > creation of the >> > > > OrangeFS fails. >> > > > >> > > > The batch_create error is still there because it >> > appears >> > > > just when I >> > > > launch the servers. The creation of the root >> > directory >> > > fails >> > > > too, as I >> > > > have mentioned. I think this is the relevant >> part of >> > > the log >> > > > messages >> > > > regarding the problem with the root directory: >> > > > >> > > > [D 05/15/2015 21:08:37] >> server_post_unexpected_recv >> > > > [D 05/15/2015 21:08:37] >> > server_op_state_get_machine 999 >> > > > [D 05/15/2015 21:08:37] Initialization completed >> > > successfully. >> > > > [D 05/15/2015 21:08:37] >> > > server_state_machine_alloc_noreq 27 >> > > > [D 05/15/2015 21:08:37] >> > server_op_state_get_machine 27 >> > > > [D 05/15/2015 21:08:37] >> > server_state_machine_start_noreq >> > > > 0x1d6fa10 >> > > > [D 05/15/2015 21:08:37] *** Trove KeyVal Read of >> > /dda >> > > > [D 05/15/2015 21:08:37] op_queue add: 0x1d71100 >> > > > [D 05/15/2015 21:08:37] [DBPF THREAD]: [KEYVAL >> > -1]: -7 >> > > > [D 05/15/2015 21:08:37] [DBPF THREAD]: STARTING >> > TROVE >> > > > SERVICE ROUTINE >> > > > (KEYVAL_READ) >> > > > [D 05/15/2015 21:08:37] warning: keyval read >> > error on >> > > handle >> > > > 1048576 and >> > > > key= /dda (BDB0073 DB_NOTFOUND: No matching >> key/data >> > > pair found) >> > > > [D 05/15/2015 21:08:37] [DBPF THREAD]: FINISHED >> > TROVE >> > > > SERVICE ROUTINE >> > > > (KEYVAL_READ) (ret: -1073742082) >> > > > [D 05/15/2015 21:08:37] op_queue add: 0x1d71100 >> > > > [D 05/15/2015 21:08:37] >> > > server_state_machine_alloc_noreq 46 >> > > > [D 05/15/2015 21:08:37] >> > server_op_state_get_machine 46 >> > > > [D 05/15/2015 21:08:37] >> > server_state_machine_start_noreq >> > > > 0x1d70f80 >> > > > [D 05/15/2015 21:08:37] mgmt-create-root-dir: >> Init >> > > > dist-dir-attr for dir >> > > > meta handle 1048576 with tree_height=1, >> > num_servers=2, >> > > > bitmap_size=1, >> > > > split_size=100, server_no=0 and branch_level=1 >> > > > [D 05/15/2015 21:08:37] mgmt-create-root-dir: >> Init >> > > > dist_dir_bitmap as: >> > > > [D 05/15/2015 21:08:37] i=0 : 00 00 00 03 >> > > > [D 05/15/2015 21:08:37] >> > > > [D 05/15/2015 21:08:37] creating 1 local dirdata >> > files >> > > > [D 05/15/2015 21:08:37] creating 1 remote >> > dirdata files >> > > > [D 05/15/2015 21:08:37] >> > job_precreate_pool_get_handles: >> > > > requesting 1 >> > > > handles of type 16 >> > > > [E 05/15/2015 21:08:37] Warning: unable to >> > create root dir >> > > > due to error: >> > > > Invalid argument >> > > > [E 05/15/2015 21:08:37] Your FS may be >> > in an >> > > > inconsistent state >> > > > [D 05/15/2015 21:08:37] >> > > server_state_machine_complete_noreq: >> > > > 0x1d70f80 >> > > > [D 05/15/2015 21:08:37] >> > server_state_machine_terminate >> > > 0x1d70f80 >> > > > [E 05/15/2015 21:08:43] PVFS2 server got signal >> 15 >> > > > (server_status_flag: >> > > > 4177919) >> > > > [D 05/15/2015 21:08:43] >> > server_state_machine_terminate >> > > 0x1d2e970 >> > > > >> > > > Hope this helps. >> > > > >> > > > Regards, >> > > > >> > > > Juan >> > > > >> > > > >> > > > El 15/05/15 a las 22:13, Becky Ligon >> escribió: >> > > > Juan: >> > > > >> > > > You may have hit upon another problem that >> we've >> > > > encountered where the >> > > > splitting of directories goes into a race >> > condition. >> > > > Try this: >> > > > >> > > > 1. In your orangefs-server.conf file, set >> > > > DistrDirServersInitial 1 and >> > > > DistrDirServersMax 1 in your multi-server >> > > configuration >> > > > installation. >> > > > >> > > > 2. Delete your data and metadata areas and >> > recreate. >> > > > Start your servers. >> > > > >> > > > 3. Run your tests. >> > > > >> > > > See if this helps! >> > > > >> > > > NOTE: We are working on a fix for this >> > problem right >> > > > now but don't have >> > > > a working solution just yet. >> > > > >> > > > Becky >> > > > >> > > > On Fri, May 15, 2015 at 3:38 PM, Juan PC >> > > > <[email protected] >> > <mailto:[email protected]> <javascript:;> >> > > <mailto:[email protected] <mailto:[email protected]> >> > <javascript:;>> >> > > > <mailto:[email protected] >> > <mailto:[email protected]> <javascript:;> >> > > > <mailto:[email protected] >> > <mailto:[email protected]> <javascript:;>>>> wrote: >> > > > >> > > > Hi Becky, >> > > > >> > > > Thank you for your response :-) >> > > > >> > > > The problem is that the log file grows at >> > a rate of >> > > > around 2 MiB per >> > > > second (EvenLogging is set to none!) and, >> > more >> > > > importantly, a simple >> > > > pvfs2-ls does not work. The latter is >> > probably >> > > due to >> > > > an error message >> > > > that I get after starting the server that >> > > stores the >> > > > root file system: >> > > > >> > > > [E 05/15/2015 18:38:08] Warning: unable >> > to create >> > > > root dir due to error: >> > > > Resource temporarily unavailable >> > > > [E 05/15/2015 18:38:08] Your FS >> > may be >> > > in an >> > > > inconsistent state >> > > > >> > > > although the batch_create errors appears >> > after, >> > > when >> > > > a second server >> > > > is run. >> > > > >> > > > I have spent a lot of time trying >> different >> > > > compilation options, >> > > > configurations, db versions, checking >> that I >> > > run the >> > > > right executables, >> > > > that they use the same filesystem >> > configuration >> > > file, >> > > > etc., and the >> > > > results is always the same. Well, to be >> > honest, >> > > I was >> > > > able to activate >> > > > the file system once (I do not know how), >> > but it >> > > > started failing when I >> > > > tried to create a few thousands files per >> > directory >> > > > (bechmark >> > > > hpcs-io_1.2.0-rc1, scenarios 9-12). >> > > > >> > > > My feeling is that, with two servers, the >> > > problematic >> > > > sever (the one >> > > > aimed at storing the root directory) >> does not >> > > > communicate correctly with >> > > > the second server. There is no firewall, >> > SELinux is >> > > > disabled, etc. >> > > > >> > > > Some final remarks: >> > > > - Security is always the default one, I >> have >> > > not used >> > > > either >> > > > --enable-security-key or >> > --enable-security-cert >> > > option. >> > > > - Same steps with OrangeFS 2.8.7 and not >> > > problem at all. >> > > > >> > > > So I guess that I should be doing >> something >> > > terribly >> > > > wrong, but I do not >> > > > know what :-( >> > > > >> > > > If I can do something (for instance, >> > running the >> > > > servers with >> > > > EvenLogging set to verbose), just let me >> > know. >> > > > >> > > > Regards, >> > > > >> > > > Juan >> > > > >> > > > El 15/05/15 a las 20:12, Becky Ligon >> > escribió: >> > > > This is normal for 2.9.1 and okay to >> get the >> > > > messages you are seeing. >> > > > batch_create comes into play when a >> server >> > > needs to >> > > > gather more handles >> > > > (like inodes) from another server. The >> > "Resource >> > > > temporarily >> > > > unavailable" is generated when the >> > capability >> > > > associated with this >> > > > request has timed out. So, the calling >> > server >> > > > regenerates the >> > > > capability and resends the batch_create >> > request. >> > > > >> > > > The OFS development team is changing >> > when these >> > > > capabilities get >> > > > generated for batch_create requests to >> > alleviate >> > > > this problem. For now, >> > > > you can ignore these messages. >> > > > >> > > > Sorry for the inconvenience. >> > > > >> > > > Becky >> > > > >> > > > >> > > > >> > > > On Fri, May 15, 2015 at 11:48 AM, Juan >> PC >> > > > <[email protected] >> > <mailto:[email protected]> <javascript:;> >> > > <mailto:[email protected] <mailto:[email protected]> >> > <javascript:;>> >> > > > <mailto:[email protected] >> > <mailto:[email protected]> <javascript:;> >> > > > <mailto:[email protected] >> > <mailto:[email protected]> <javascript:;>>> >> > > > <mailto:[email protected] >> > <mailto:[email protected]> <javascript:;> >> > > > <mailto:[email protected] >> > <mailto:[email protected]> <javascript:;>> >> > > > <mailto:[email protected] >> > <mailto:[email protected]> <javascript:;> >> > > > <mailto:[email protected] >> > <mailto:[email protected]> <javascript:;>>>>> >> > > wrote: >> > > > >> > > > Dear Becky, >> > > > >> > > > I am trying to use orangefs-2.9.1, >> but >> > > everytime >> > > > I run the >> > > > >> > > > servers I get >> > > > >> > > > the message of the subject in one of >> the >> > > servers, >> > > > and its log >> > > > >> > > > file grows >> > > > >> > > > very quickly. The last reference >> that I >> > > have seen >> > > > about this >> > > > >> > > > problem is >> > > > >> > > > >> > > >> > >> http://www.beowulf-underground.org/pipermail/pvfs2-users/2015-April/004432.html >> . >> > > > >> > > > I have used option >> --disable-capcache of >> > > > configure, but same >> > > > >> > > > result. Do >> > > > >> > > > you know if this issue has been >> already >> > > fixed or >> > > > if there is a >> > > > workaround? >> > > > >> > > > Best regards, >> > > > >> > > > Juan >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > ---------------------------------------------------------------- >> > > > This message was sent using IMP, the Internet Messaging >> > Program. >> > > > >> > > > >> > > >> > > >> > > >> > > -- >> > > Sent from Gmail Mobile >> > >> > >> > -- >> > D. Juan Piernas Cánovas >> > Departamento de Ingeniería y Tecnología de Computadores >> > Facultad de Informática. Universidad de Murcia >> > Campus de Espinardo - 30080 Murcia (SPAIN) >> > Tel.: +34868887657 Fax: +34868884151 >> > email: [email protected] <mailto:[email protected]> >> > PGP public key: >> > >> http://pgp.rediris.es:11371/pks/lookup?search=piernas%40ditec.um.es&op=index >> > >> > *** Por favor, envíeme sus documentos en formato texto, HTML, PDF o >> > PostScript :-) *** >> > _______________________________________________ >> > Pvfs2-users mailing list >> > [email protected] >> > <mailto:[email protected]> >> > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users >> > >> >> >> -- >> D. Juan Piernas Cánovas >> Departamento de Ingeniería y Tecnología de Computadores >> Facultad de Informática. Universidad de Murcia >> Campus de Espinardo - 30080 Murcia (SPAIN) >> Tel.: +34868887657 Fax: +34868884151 >> email: [email protected] >> PGP public key: >> >> http://pgp.rediris.es:11371/pks/lookup?search=piernas%40ditec.um.es&op=index >> >> *** Por favor, envíeme sus documentos en formato texto, HTML, PDF o >> PostScript :-) *** >> >
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
