The new capability based security uses pki so it is time dependent, so time drift could cause problems. As far as I can tell we have not documented this, so we need to do so.
-b On Fri, May 22, 2015 at 6:49 PM Juan PC <[email protected]> wrote: > Hi Becky, > > When I have tried to set up an OrangeFS cluster with 4 and 8 nodes, the > batch_create error message has appeared again. Then, I have realized > that some of my nodes had a wrong time (with a maximum difference of two > hours and a half between nodes). After synchronizing the times, the > batch_create problem seems to be gone. Does this make sense? I mean, can > a wrong time in some servers cause the problem? I do not remember seeing > any recommendation or warning about node times in the OrangeFS > documentation? > > Regards, > > Juan > > El 16/05/15 a las 22:59, Becky Ligon escribió: > > Juan: > > > > The conf file looks good. Can you send me your server log files? > > > > Becky > > > > On Saturday, May 16, 2015, Juan PC <[email protected] > > <mailto:[email protected]>> wrote: > > > > It is attached. > > > > I do not know if this is important, but one thing that I have seen > with > > this configuration file is that if I run the second server just after > > running the first server, everything seems to work. However, if I > wait > > for a few seconds, the error message of the root directory appears in > > the first server. Then, when I launch de second server, I get the > > avalanche of batch_create error messages. This avalanche seems to > stop > > when it has generated around 1 GB of data. However, because of the > > problem with the root directory, the file system does not work. > > > > I have checked if waiting for a few seconds between server > executions is > > an issue in OrangeFS 2.8.7 and it is not. > > > > Regards, > > > > Juan > > > > El 16/05/15 a las 17:59, Becky Ligon escribió: > > > Can you send me your orangefs-server.conf file? > > > > > > NOTE: do not use native IB with this version. we have a known > issue > > > with distributed directories and IB that we are currently working > on. > > > > > > Becky > > > > > > On Sat, May 16, 2015 at 11:43 AM, <[email protected] > <javascript:;> > > > <mailto:[email protected] <javascript:;>>> wrote: > > > > > > No, only TCP over Ethernet. We have IB NICs, but I have not > > compiled > > > OrangeFS with support for them. > > > > > > Juan > > > > > > > > > Quoting "Becky Ligon" <[email protected] <javascript:;> > > > <mailto:[email protected] <javascript:;>>>: > > > > > > Are you using native IB? > > > > > > Becky > > > > > > Sent from my iPhone > > > > > > On May 15, 2015, at 5:39 PM, Juan PC > > <[email protected] <javascript:;> > > > <mailto:[email protected] <javascript:;>>> wrote: > > > > > > Hi, > > > > > > Well, your configuration can probably avoid the > > problem with the > > > benchmark, which I can not run because the creation of > the > > > OrangeFS fails. > > > > > > The batch_create error is still there because it > appears > > > just when I > > > launch the servers. The creation of the root directory > > fails > > > too, as I > > > have mentioned. I think this is the relevant part of > > the log > > > messages > > > regarding the problem with the root directory: > > > > > > [D 05/15/2015 21:08:37] server_post_unexpected_recv > > > [D 05/15/2015 21:08:37] server_op_state_get_machine 999 > > > [D 05/15/2015 21:08:37] Initialization completed > > successfully. > > > [D 05/15/2015 21:08:37] > > server_state_machine_alloc_noreq 27 > > > [D 05/15/2015 21:08:37] server_op_state_get_machine 27 > > > [D 05/15/2015 21:08:37] > server_state_machine_start_noreq > > > 0x1d6fa10 > > > [D 05/15/2015 21:08:37] *** Trove KeyVal Read of /dda > > > [D 05/15/2015 21:08:37] op_queue add: 0x1d71100 > > > [D 05/15/2015 21:08:37] [DBPF THREAD]: [KEYVAL -1]: -7 > > > [D 05/15/2015 21:08:37] [DBPF THREAD]: STARTING TROVE > > > SERVICE ROUTINE > > > (KEYVAL_READ) > > > [D 05/15/2015 21:08:37] warning: keyval read error on > > handle > > > 1048576 and > > > key= /dda (BDB0073 DB_NOTFOUND: No matching key/data > > pair found) > > > [D 05/15/2015 21:08:37] [DBPF THREAD]: FINISHED TROVE > > > SERVICE ROUTINE > > > (KEYVAL_READ) (ret: -1073742082) > > > [D 05/15/2015 21:08:37] op_queue add: 0x1d71100 > > > [D 05/15/2015 21:08:37] > > server_state_machine_alloc_noreq 46 > > > [D 05/15/2015 21:08:37] server_op_state_get_machine 46 > > > [D 05/15/2015 21:08:37] > server_state_machine_start_noreq > > > 0x1d70f80 > > > [D 05/15/2015 21:08:37] mgmt-create-root-dir: Init > > > dist-dir-attr for dir > > > meta handle 1048576 with tree_height=1, num_servers=2, > > > bitmap_size=1, > > > split_size=100, server_no=0 and branch_level=1 > > > [D 05/15/2015 21:08:37] mgmt-create-root-dir: Init > > > dist_dir_bitmap as: > > > [D 05/15/2015 21:08:37] i=0 : 00 00 00 03 > > > [D 05/15/2015 21:08:37] > > > [D 05/15/2015 21:08:37] creating 1 local dirdata files > > > [D 05/15/2015 21:08:37] creating 1 remote dirdata files > > > [D 05/15/2015 21:08:37] job_precreate_pool_get_handles: > > > requesting 1 > > > handles of type 16 > > > [E 05/15/2015 21:08:37] Warning: unable to create root > dir > > > due to error: > > > Invalid argument > > > [E 05/15/2015 21:08:37] Your FS may be in an > > > inconsistent state > > > [D 05/15/2015 21:08:37] > > server_state_machine_complete_noreq: > > > 0x1d70f80 > > > [D 05/15/2015 21:08:37] server_state_machine_terminate > > 0x1d70f80 > > > [E 05/15/2015 21:08:43] PVFS2 server got signal 15 > > > (server_status_flag: > > > 4177919) > > > [D 05/15/2015 21:08:43] server_state_machine_terminate > > 0x1d2e970 > > > > > > Hope this helps. > > > > > > Regards, > > > > > > Juan > > > > > > > > > El 15/05/15 a las 22:13, Becky Ligon escribió: > > > Juan: > > > > > > You may have hit upon another problem that we've > > > encountered where the > > > splitting of directories goes into a race > condition. > > > Try this: > > > > > > 1. In your orangefs-server.conf file, set > > > DistrDirServersInitial 1 and > > > DistrDirServersMax 1 in your multi-server > > configuration > > > installation. > > > > > > 2. Delete your data and metadata areas and > recreate. > > > Start your servers. > > > > > > 3. Run your tests. > > > > > > See if this helps! > > > > > > NOTE: We are working on a fix for this problem > right > > > now but don't have > > > a working solution just yet. > > > > > > Becky > > > > > > On Fri, May 15, 2015 at 3:38 PM, Juan PC > > > <[email protected] <javascript:;> > > <mailto:[email protected] <javascript:;>> > > > <mailto:[email protected] <javascript:;> > > > <mailto:[email protected] <javascript:;>>>> > wrote: > > > > > > Hi Becky, > > > > > > Thank you for your response :-) > > > > > > The problem is that the log file grows at a > rate of > > > around 2 MiB per > > > second (EvenLogging is set to none!) and, more > > > importantly, a simple > > > pvfs2-ls does not work. The latter is probably > > due to > > > an error message > > > that I get after starting the server that > > stores the > > > root file system: > > > > > > [E 05/15/2015 18:38:08] Warning: unable to > create > > > root dir due to error: > > > Resource temporarily unavailable > > > [E 05/15/2015 18:38:08] Your FS may be > > in an > > > inconsistent state > > > > > > although the batch_create errors appears after, > > when > > > a second server > > > is run. > > > > > > I have spent a lot of time trying different > > > compilation options, > > > configurations, db versions, checking that I > > run the > > > right executables, > > > that they use the same filesystem configuration > > file, > > > etc., and the > > > results is always the same. Well, to be honest, > > I was > > > able to activate > > > the file system once (I do not know how), but it > > > started failing when I > > > tried to create a few thousands files per > directory > > > (bechmark > > > hpcs-io_1.2.0-rc1, scenarios 9-12). > > > > > > My feeling is that, with two servers, the > > problematic > > > sever (the one > > > aimed at storing the root directory) does not > > > communicate correctly with > > > the second server. There is no firewall, > SELinux is > > > disabled, etc. > > > > > > Some final remarks: > > > - Security is always the default one, I have > > not used > > > either > > > --enable-security-key or --enable-security-cert > > option. > > > - Same steps with OrangeFS 2.8.7 and not > > problem at all. > > > > > > So I guess that I should be doing something > > terribly > > > wrong, but I do not > > > know what :-( > > > > > > If I can do something (for instance, running the > > > servers with > > > EvenLogging set to verbose), just let me know. > > > > > > Regards, > > > > > > Juan > > > > > > El 15/05/15 a las 20:12, Becky Ligon > escribió: > > > This is normal for 2.9.1 and okay to get the > > > messages you are seeing. > > > batch_create comes into play when a server > > needs to > > > gather more handles > > > (like inodes) from another server. The > "Resource > > > temporarily > > > unavailable" is generated when the capability > > > associated with this > > > request has timed out. So, the calling server > > > regenerates the > > > capability and resends the batch_create > request. > > > > > > The OFS development team is changing when these > > > capabilities get > > > generated for batch_create requests to > alleviate > > > this problem. For now, > > > you can ignore these messages. > > > > > > Sorry for the inconvenience. > > > > > > Becky > > > > > > > > > > > > On Fri, May 15, 2015 at 11:48 AM, Juan PC > > > <[email protected] <javascript:;> > > <mailto:[email protected] <javascript:;>> > > > <mailto:[email protected] <javascript:;> > > > <mailto:[email protected] <javascript:;>>> > > > <mailto:[email protected] <javascript:;> > > > <mailto:[email protected] <javascript:;>> > > > <mailto:[email protected] <javascript:;> > > > <mailto:[email protected] <javascript:;>>>>> > > wrote: > > > > > > Dear Becky, > > > > > > I am trying to use orangefs-2.9.1, but > > everytime > > > I run the > > > > > > servers I get > > > > > > the message of the subject in one of the > > servers, > > > and its log > > > > > > file grows > > > > > > very quickly. The last reference that I > > have seen > > > about this > > > > > > problem is > > > > > > > > > http://www.beowulf-underground.org/pipermail/pvfs2-users/2015-April/004432.html > . > > > > > > I have used option --disable-capcache of > > > configure, but same > > > > > > result. Do > > > > > > you know if this issue has been already > > fixed or > > > if there is a > > > workaround? > > > > > > Best regards, > > > > > > Juan > > > > > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------- > > > This message was sent using IMP, the Internet Messaging > Program. > > > > > > > > > > > > > > -- > > Sent from Gmail Mobile > > > -- > D. Juan Piernas Cánovas > Departamento de Ingeniería y Tecnología de Computadores > Facultad de Informática. Universidad de Murcia > Campus de Espinardo - 30080 Murcia (SPAIN) > Tel.: +34868887657 Fax: +34868884151 > email: [email protected] > PGP public key: > > http://pgp.rediris.es:11371/pks/lookup?search=piernas%40ditec.um.es&op=index > > *** Por favor, envíeme sus documentos en formato texto, HTML, PDF o > PostScript :-) *** > _______________________________________________ > Pvfs2-users mailing list > [email protected] > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users >
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
