Juan:

We have also been able to recreate your problem with startup and creating
the root directory information.  We are working now to put a fix in place.

Becky

On Fri, May 22, 2015 at 7:20 PM, Boyd Wilson <[email protected]> wrote:

> default mode still uses PKI to some degree, but all of the expensive
> signing operations are done without or minimal keys, but the time drift may
> affect it (possibly,  I will have to check with the developers that are
> more familiar with that code).
>
> -b
>
> On Fri, May 22, 2015 at 7:09 PM Juan PC <[email protected]> wrote:
>
>> Good to know :-). However, I use the default mode security (the old one,
>> I think).
>>
>> Regards,
>>
>>         Juan
>>
>> El 23/05/15 a las 00:52, Boyd Wilson escribió:
>> > The new capability based security uses pki so it is time dependent, so
>> > time drift could cause problems.   As far as I can tell we have not
>> > documented this, so we need to do so.
>> >
>> > -b
>> >
>> > On Fri, May 22, 2015 at 6:49 PM Juan PC <[email protected]
>> > <mailto:[email protected]>> wrote:
>> >
>> >     Hi Becky,
>> >
>> >     When I have tried to set up an OrangeFS cluster with 4 and 8 nodes,
>> the
>> >     batch_create error message has appeared again. Then, I have realized
>> >     that some of my nodes had a wrong time (with a maximum difference
>> of two
>> >     hours and a half between nodes). After synchronizing the times, the
>> >     batch_create problem seems to be gone. Does this make sense? I
>> mean, can
>> >     a wrong time in some servers cause the problem? I do not remember
>> seeing
>> >     any recommendation or warning about node times in the OrangeFS
>> >     documentation?
>> >
>> >     Regards,
>> >
>> >             Juan
>> >
>> >     El 16/05/15 a las 22:59, Becky Ligon escribió:
>> >     > Juan:
>> >     >
>> >     > The conf file looks good.  Can you send me your server log files?
>> >     >
>> >     > Becky
>> >     >
>> >     > On Saturday, May 16, 2015, Juan PC <[email protected]
>> >     <mailto:[email protected]>
>> >     > <mailto:[email protected] <mailto:[email protected]>>> wrote:
>> >     >
>> >     >     It is attached.
>> >     >
>> >     >     I do not know if this is important, but one thing that I have
>> >     seen with
>> >     >     this configuration file is that if I run the second server
>> >     just after
>> >     >     running the first server, everything seems to work. However,
>> >     if I wait
>> >     >     for a few seconds, the error message of the root directory
>> >     appears in
>> >     >     the first server. Then, when I launch de second server, I get
>> the
>> >     >     avalanche of batch_create error messages. This avalanche seems
>> >     to stop
>> >     >     when it has generated around 1 GB of data. However, because
>> of the
>> >     >     problem with the root directory, the file system does not
>> work.
>> >     >
>> >     >     I have checked if waiting for a few seconds between server
>> >     executions is
>> >     >     an issue in OrangeFS 2.8.7 and it is not.
>> >     >
>> >     >     Regards,
>> >     >
>> >     >             Juan
>> >     >
>> >     >     El 16/05/15 a las 17:59, Becky Ligon escribió:
>> >     >     > Can you send me your orangefs-server.conf file?
>> >     >     >
>> >     >     > NOTE:  do not use native IB with this version.  we have a
>> >     known issue
>> >     >     > with distributed directories and IB that we are currently
>> >     working on.
>> >     >     >
>> >     >     > Becky
>> >     >     >
>> >     >     > On Sat, May 16, 2015 at 11:43 AM, <[email protected]
>> >     <mailto:[email protected]> <javascript:;>
>> >     >     > <mailto:[email protected] <mailto:[email protected]>
>> >     <javascript:;>>> wrote:
>> >     >     >
>> >     >     >     No, only TCP over Ethernet. We have IB NICs, but I have
>> not
>> >     >     compiled
>> >     >     >     OrangeFS with support for them.
>> >     >     >
>> >     >     >            Juan
>> >     >     >
>> >     >     >
>> >     >     >     Quoting "Becky Ligon" <[email protected]
>> >     <mailto:[email protected]> <javascript:;>
>> >     >     >     <mailto:[email protected] <mailto:[email protected]
>> >
>> >     <javascript:;>>>:
>> >     >     >
>> >     >     >         Are you using native IB?
>> >     >     >
>> >     >     >         Becky
>> >     >     >
>> >     >     >         Sent from my iPhone
>> >     >     >
>> >     >     >             On May 15, 2015, at 5:39 PM, Juan PC
>> >     >     <[email protected] <mailto:[email protected]>
>> <javascript:;>
>> >     >     >             <mailto:[email protected]
>> >     <mailto:[email protected]> <javascript:;>>> wrote:
>> >     >     >
>> >     >     >             Hi,
>> >     >     >
>> >     >     >             Well, your configuration can probably avoid the
>> >     >     problem with the
>> >     >     >             benchmark, which I can not run because the
>> >     creation of the
>> >     >     >             OrangeFS fails.
>> >     >     >
>> >     >     >             The batch_create error is still there because it
>> >     appears
>> >     >     >             just when I
>> >     >     >             launch the servers. The creation of the root
>> >     directory
>> >     >     fails
>> >     >     >             too, as I
>> >     >     >             have mentioned. I think this is the relevant
>> part of
>> >     >     the log
>> >     >     >             messages
>> >     >     >             regarding the problem with the root directory:
>> >     >     >
>> >     >     >             [D 05/15/2015 21:08:37]
>> server_post_unexpected_recv
>> >     >     >             [D 05/15/2015 21:08:37]
>> >     server_op_state_get_machine 999
>> >     >     >             [D 05/15/2015 21:08:37] Initialization completed
>> >     >     successfully.
>> >     >     >             [D 05/15/2015 21:08:37]
>> >     >     server_state_machine_alloc_noreq 27
>> >     >     >             [D 05/15/2015 21:08:37]
>> >     server_op_state_get_machine 27
>> >     >     >             [D 05/15/2015 21:08:37]
>> >     server_state_machine_start_noreq
>> >     >     >             0x1d6fa10
>> >     >     >             [D 05/15/2015 21:08:37] *** Trove KeyVal Read of
>> >     /dda
>> >     >     >             [D 05/15/2015 21:08:37] op_queue add: 0x1d71100
>> >     >     >             [D 05/15/2015 21:08:37] [DBPF THREAD]: [KEYVAL
>> >     -1]: -7
>> >     >     >             [D 05/15/2015 21:08:37] [DBPF THREAD]: STARTING
>> >     TROVE
>> >     >     >             SERVICE ROUTINE
>> >     >     >             (KEYVAL_READ)
>> >     >     >             [D 05/15/2015 21:08:37] warning: keyval read
>> >     error on
>> >     >     handle
>> >     >     >             1048576 and
>> >     >     >             key= /dda (BDB0073 DB_NOTFOUND: No matching
>> key/data
>> >     >     pair found)
>> >     >     >             [D 05/15/2015 21:08:37] [DBPF THREAD]: FINISHED
>> >     TROVE
>> >     >     >             SERVICE ROUTINE
>> >     >     >             (KEYVAL_READ) (ret: -1073742082)
>> >     >     >             [D 05/15/2015 21:08:37] op_queue add: 0x1d71100
>> >     >     >             [D 05/15/2015 21:08:37]
>> >     >     server_state_machine_alloc_noreq 46
>> >     >     >             [D 05/15/2015 21:08:37]
>> >     server_op_state_get_machine 46
>> >     >     >             [D 05/15/2015 21:08:37]
>> >     server_state_machine_start_noreq
>> >     >     >             0x1d70f80
>> >     >     >             [D 05/15/2015 21:08:37] mgmt-create-root-dir:
>> Init
>> >     >     >             dist-dir-attr for dir
>> >     >     >             meta handle 1048576 with tree_height=1,
>> >     num_servers=2,
>> >     >     >             bitmap_size=1,
>> >     >     >             split_size=100, server_no=0 and branch_level=1
>> >     >     >             [D 05/15/2015 21:08:37] mgmt-create-root-dir:
>> Init
>> >     >     >             dist_dir_bitmap as:
>> >     >     >             [D 05/15/2015 21:08:37]  i=0 : 00 00 00 03
>> >     >     >             [D 05/15/2015 21:08:37]
>> >     >     >             [D 05/15/2015 21:08:37] creating 1 local dirdata
>> >     files
>> >     >     >             [D 05/15/2015 21:08:37] creating 1 remote
>> >     dirdata files
>> >     >     >             [D 05/15/2015 21:08:37]
>> >     job_precreate_pool_get_handles:
>> >     >     >             requesting 1
>> >     >     >             handles of type 16
>> >     >     >             [E 05/15/2015 21:08:37] Warning: unable to
>> >     create root dir
>> >     >     >             due to error:
>> >     >     >             Invalid argument
>> >     >     >             [E 05/15/2015 21:08:37]          Your FS may be
>> >     in an
>> >     >     >             inconsistent state
>> >     >     >             [D 05/15/2015 21:08:37]
>> >     >     server_state_machine_complete_noreq:
>> >     >     >             0x1d70f80
>> >     >     >             [D 05/15/2015 21:08:37]
>> >     server_state_machine_terminate
>> >     >     0x1d70f80
>> >     >     >             [E 05/15/2015 21:08:43] PVFS2 server got signal
>> 15
>> >     >     >             (server_status_flag:
>> >     >     >             4177919)
>> >     >     >             [D 05/15/2015 21:08:43]
>> >     server_state_machine_terminate
>> >     >     0x1d2e970
>> >     >     >
>> >     >     >             Hope this helps.
>> >     >     >
>> >     >     >             Regards,
>> >     >     >
>> >     >     >                Juan
>> >     >     >
>> >     >     >
>> >     >     >                 El 15/05/15 a las 22:13, Becky Ligon
>> escribió:
>> >     >     >                 Juan:
>> >     >     >
>> >     >     >                 You may have hit upon another problem that
>> we've
>> >     >     >                 encountered where the
>> >     >     >                 splitting of directories goes into a race
>> >     condition.
>> >     >     >                 Try this:
>> >     >     >
>> >     >     >                 1.  In your orangefs-server.conf file, set
>> >     >     >                 DistrDirServersInitial 1 and
>> >     >     >                 DistrDirServersMax 1 in your multi-server
>> >     >     configuration
>> >     >     >                 installation.
>> >     >     >
>> >     >     >                 2.  Delete your data and metadata areas and
>> >     recreate.
>> >     >     >                 Start your servers.
>> >     >     >
>> >     >     >                 3.  Run your tests.
>> >     >     >
>> >     >     >                 See if this helps!
>> >     >     >
>> >     >     >                 NOTE:  We are working on a fix for this
>> >     problem right
>> >     >     >                 now but don't have
>> >     >     >                 a working solution just yet.
>> >     >     >
>> >     >     >                 Becky
>> >     >     >
>> >     >     >                 On Fri, May 15, 2015 at 3:38 PM, Juan PC
>> >     >     >                 <[email protected]
>> >     <mailto:[email protected]> <javascript:;>
>> >     >     <mailto:[email protected] <mailto:[email protected]>
>> >     <javascript:;>>
>> >     >     >                 <mailto:[email protected]
>> >     <mailto:[email protected]> <javascript:;>
>> >     >     >                 <mailto:[email protected]
>> >     <mailto:[email protected]> <javascript:;>>>> wrote:
>> >     >     >
>> >     >     >                    Hi Becky,
>> >     >     >
>> >     >     >                    Thank you for your response :-)
>> >     >     >
>> >     >     >                    The problem is that the log file grows at
>> >     a rate of
>> >     >     >                 around 2 MiB per
>> >     >     >                    second (EvenLogging is set to none!) and,
>> >     more
>> >     >     >                 importantly, a simple
>> >     >     >                    pvfs2-ls does not work. The latter is
>> >     probably
>> >     >     due to
>> >     >     >                 an error message
>> >     >     >                    that I get after starting the server that
>> >     >     stores the
>> >     >     >                 root file system:
>> >     >     >
>> >     >     >                    [E 05/15/2015 18:38:08] Warning: unable
>> >     to create
>> >     >     >                 root dir due to error:
>> >     >     >                    Resource temporarily unavailable
>> >     >     >                    [E 05/15/2015 18:38:08]          Your FS
>> >     may be
>> >     >     in an
>> >     >     >                 inconsistent state
>> >     >     >
>> >     >     >                    although the batch_create errors appears
>> >     after,
>> >     >     when
>> >     >     >                 a second server
>> >     >     >                    is run.
>> >     >     >
>> >     >     >                    I have spent a lot of time trying
>> different
>> >     >     >                 compilation options,
>> >     >     >                    configurations, db versions, checking
>> that I
>> >     >     run the
>> >     >     >                 right executables,
>> >     >     >                    that they use the same filesystem
>> >     configuration
>> >     >     file,
>> >     >     >                 etc., and the
>> >     >     >                    results is always the same. Well, to be
>> >     honest,
>> >     >     I was
>> >     >     >                 able to activate
>> >     >     >                    the file system once (I do not know how),
>> >     but it
>> >     >     >                 started failing when I
>> >     >     >                    tried to create a few thousands files per
>> >     directory
>> >     >     >                 (bechmark
>> >     >     >                    hpcs-io_1.2.0-rc1, scenarios 9-12).
>> >     >     >
>> >     >     >                    My feeling is that, with two servers, the
>> >     >     problematic
>> >     >     >                 sever (the one
>> >     >     >                    aimed at storing the root directory)
>> does not
>> >     >     >                 communicate correctly with
>> >     >     >                    the second server. There is no firewall,
>> >     SELinux is
>> >     >     >                 disabled, etc.
>> >     >     >
>> >     >     >                    Some final remarks:
>> >     >     >                    - Security is always the default one, I
>> have
>> >     >     not used
>> >     >     >                 either
>> >     >     >                    --enable-security-key or
>> >     --enable-security-cert
>> >     >     option.
>> >     >     >                    - Same steps with OrangeFS 2.8.7 and not
>> >     >     problem at all.
>> >     >     >
>> >     >     >                    So I guess that I should be doing
>> something
>> >     >     terribly
>> >     >     >                 wrong, but I do not
>> >     >     >                    know what :-(
>> >     >     >
>> >     >     >                    If I can do something (for instance,
>> >     running the
>> >     >     >                 servers with
>> >     >     >                    EvenLogging set to verbose), just let me
>> >     know.
>> >     >     >
>> >     >     >                    Regards,
>> >     >     >
>> >     >     >                            Juan
>> >     >     >
>> >     >     >                        El 15/05/15 a las 20:12, Becky Ligon
>> >     escribió:
>> >     >     >                     This is normal for 2.9.1 and okay to
>> get the
>> >     >     >                     messages you are seeing.
>> >     >     >                     batch_create comes into play when a
>> server
>> >     >     needs to
>> >     >     >                     gather more handles
>> >     >     >                     (like inodes) from another server.  The
>> >     "Resource
>> >     >     >                     temporarily
>> >     >     >                     unavailable" is generated when the
>> >     capability
>> >     >     >                     associated with this
>> >     >     >                     request has timed out.  So, the calling
>> >     server
>> >     >     >                     regenerates the
>> >     >     >                     capability and resends the batch_create
>> >     request.
>> >     >     >
>> >     >     >                     The OFS development team is changing
>> >     when these
>> >     >     >                     capabilities get
>> >     >     >                     generated for batch_create requests to
>> >     alleviate
>> >     >     >                     this problem.  For now,
>> >     >     >                     you can ignore these messages.
>> >     >     >
>> >     >     >                     Sorry for the inconvenience.
>> >     >     >
>> >     >     >                     Becky
>> >     >     >
>> >     >     >
>> >     >     >
>> >     >     >                     On Fri, May 15, 2015 at 11:48 AM, Juan
>> PC
>> >     >     >                     <[email protected]
>> >     <mailto:[email protected]> <javascript:;>
>> >     >     <mailto:[email protected] <mailto:[email protected]>
>> >     <javascript:;>>
>> >     >     >                     <mailto:[email protected]
>> >     <mailto:[email protected]> <javascript:;>
>> >     >     >                     <mailto:[email protected]
>> >     <mailto:[email protected]> <javascript:;>>>
>> >     >     >                     <mailto:[email protected]
>> >     <mailto:[email protected]> <javascript:;>
>> >     >     >                     <mailto:[email protected]
>> >     <mailto:[email protected]> <javascript:;>>
>> >     >     >                     <mailto:[email protected]
>> >     <mailto:[email protected]> <javascript:;>
>> >     >     >                     <mailto:[email protected]
>> >     <mailto:[email protected]> <javascript:;>>>>>
>> >     >     wrote:
>> >     >     >
>> >     >     >                        Dear Becky,
>> >     >     >
>> >     >     >                        I am trying to use orangefs-2.9.1,
>> but
>> >     >     everytime
>> >     >     >                     I run the
>> >     >     >
>> >     >     >                    servers I get
>> >     >     >
>> >     >     >                        the message of the subject in one of
>> the
>> >     >     servers,
>> >     >     >                     and its log
>> >     >     >
>> >     >     >                    file grows
>> >     >     >
>> >     >     >                        very quickly. The last reference
>> that I
>> >     >     have seen
>> >     >     >                     about this
>> >     >     >
>> >     >     >                    problem is
>> >     >     >
>> >     >     >
>> >     >
>> >
>> http://www.beowulf-underground.org/pipermail/pvfs2-users/2015-April/004432.html
>> .
>> >     >     >
>> >     >     >                        I have used option
>> --disable-capcache of
>> >     >     >                     configure, but same
>> >     >     >
>> >     >     >                    result. Do
>> >     >     >
>> >     >     >                        you know if this issue has been
>> already
>> >     >     fixed or
>> >     >     >                     if there is a
>> >     >     >                        workaround?
>> >     >     >
>> >     >     >                        Best regards,
>> >     >     >
>> >     >     >                                Juan
>> >     >     >
>> >     >     >
>> >     >     >
>> >     >     >
>> >     >     >
>> >     >     >
>> >     >     >
>> >      ----------------------------------------------------------------
>> >     >     >     This message was sent using IMP, the Internet Messaging
>> >     Program.
>> >     >     >
>> >     >     >
>> >     >
>> >     >
>> >     >
>> >     > --
>> >     > Sent from Gmail Mobile
>> >
>> >
>> >     --
>> >     D. Juan Piernas Cánovas
>> >     Departamento de Ingeniería y Tecnología de Computadores
>> >     Facultad de Informática. Universidad de Murcia
>> >     Campus de Espinardo - 30080 Murcia (SPAIN)
>> >     Tel.: +34868887657    Fax: +34868884151
>> >     email: [email protected] <mailto:[email protected]>
>> >     PGP public key:
>> >
>> http://pgp.rediris.es:11371/pks/lookup?search=piernas%40ditec.um.es&op=index
>> >
>> >     *** Por favor, envíeme sus documentos en formato texto, HTML, PDF o
>> >     PostScript :-) ***
>> >     _______________________________________________
>> >     Pvfs2-users mailing list
>> >     [email protected]
>> >     <mailto:[email protected]>
>> >     http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>> >
>>
>>
>> --
>> D. Juan Piernas Cánovas
>> Departamento de Ingeniería y Tecnología de Computadores
>> Facultad de Informática. Universidad de Murcia
>> Campus de Espinardo - 30080 Murcia (SPAIN)
>> Tel.: +34868887657    Fax: +34868884151
>> email: [email protected]
>> PGP public key:
>>
>> http://pgp.rediris.es:11371/pks/lookup?search=piernas%40ditec.um.es&op=index
>>
>> *** Por favor, envíeme sus documentos en formato texto, HTML, PDF o
>> PostScript :-) ***
>>
>
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to