Re: [lustre-discuss] Do I need Lustre?

2018-04-27 Thread Brett Lee
Hi Neil,

One of the considerations in using Lustre should be the I/O patterns of
your applications.  Lustre excels with large, sequential reads and writes.

Another are the costs, to include hardware, software, support, and coming
up to speed with Lustre.  These components interact.  For example, having
professional support helps with coming up to speed on Lustre. :)

Hey Michael!


On Fri, Apr 27, 2018, 12:22 PM Hebenstreit, Michael <
michael.hebenstr...@intel.com> wrote:

> You can do a simple test. Run a small sample of you application directly
> out of /dev/shm (the ram-disk). Then run it from the NFS file server. If
> you measure significant speedups your application is I/O sensitive and a
> Lustre configured with OPA or other InfiniBand solution will help.
>
>
>
> *From:* lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] *On
> Behalf Of *Thackeray, Neil L
> *Sent:* Friday, April 27, 2018 11:08 AM
> *To:* lustre-discuss@lists.lustre.org
> *Subject:* [lustre-discuss] Do I need Lustre?
>
>
>
> I’m new to the cluster realm, so I’m hoping for some good advice. We are
> starting up a new cluster, and I’ve noticed that lustre seems to be used
> widely in datacenters. The thing is I’m not sure the scale of our cluster
> will need it.
>
>
>
> We are planning a small cluster, starting with 6 -8 nodes with 2 GPUs per
> node. They will be used for Deep Learning, MRI data processing, and Matlab
> among other things. With the size of the cluster we figure that 10Gb
> networking will be sufficient. We aren’t going to allow persistent storage
> on the cluster. Users will just upload and download data. I’m mostly
> concerned about I/O speeds. I don’t know if NFS would be fast enough to
> handle the data.
>
>
>
> We are hoping that the cluster will grow over time. We are already talking
> about buying more nodes next fiscal year.
>
>
>
> Thanks.
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] error while configuring lnet

2017-11-12 Thread Brett Lee
Hi Parag - I've been away from this field for awhile, but this message may
be the most helpful at this point:

modprobe: ERROR: could not insert 'ko2iblnd': Invalid argument

Last I knew, there were two IB releases - Mellanox/MOFED and Intel
(formerly QLogic) - and the Lustre kernel you use needs to be compiled
against the IB release that is installed on the node.

So, I see a couple possibilities of why the Lustre IB module will not load:
1.  Since you are using MOFED, the kernel you have may instead support
Intel IB (thus the invalid argument).
2.  One of the options provided may not be valid for the module, or may
conflict with another provided option:

require_privileged_port=0 use_privileged_port=0 timeout=150 retry_count=7
map_on_demand=32 peer_credits=63 concurrent_sends=63 ntx=32768
credits=32768 fmr_pool_size=8193

As item 1 is probably a binary, and item 2 a matrix, confirming that the
kernel that you have supports the module that you are trying to load seems
to be a better starting point. :)

As Intel HPDD seems to provide a kernel that matches the version are using,
item 1 seems likely.
https://downloads.hpdd.intel.com/public/lustre/lustre-2.9.0/el7.3.1611/server/RPMS/x86_64/
If that is the case, a couple options include using the Intel IB, or
compiling a Lustre kernel that supports the Mellanox OFED.

Hope this helps.
Brett

On Sun, Nov 12, 2017 at 1:16 AM, Parag Khuraswar <para...@citilindia.com>
wrote:

> Hi Brett,
>
>
>
> I am using MOFED “MLNX_OFED_LINUX-4.1-1.0.2.0” with kernel
> “3.10.0-514.el7.x86_64”
>
>
>
> o/p of modprobe –v ko2iblnd
>
>
>
> [root@mds1 ~]# modprobe -v ko2iblnd
>
> install /usr/sbin/ko2iblnd-probe require_privileged_port=0
> use_privileged_port=0 timeout=150 retry_count=7 map_on_demand=32
> peer_credits=63 concurrent_sends=63 ntx=32768 credits=32768
> fmr_pool_size=8193
>
> insmod /lib/modules/3.10.0-514.el7.x86_64/extra/lustre/net/ko2iblnd.ko
> require_privileged_port=0 use_privileged_port=0 timeout=150 retry_count=7
> map_on_demand=32 peer_credits=63 concurrent_sends=63 ntx=32768
> credits=32768 fmr_pool_size=8193
>
> modprobe: ERROR: could not insert 'ko2iblnd': Invalid argument
>
> modprobe: ERROR: Error running install command for ko2iblnd
>
> modprobe: ERROR: could not insert 'ko2iblnd': Operation not permitted
>
> [root@mds1 ~]#
>
>
>
>
>
> Regards,
>
> Parag
>
>
>
>
>
> *From:* Brett Lee [mailto:brettlee.lus...@gmail.com]
> *Sent:* Saturday, November , 2017 8:08 PM
> *To:* Parag Khuraswar
> *Cc:* Mannthey, Keith; lustre-discuss@lists.lustre.org
> *Subject:* Re: [lustre-discuss] error while configuring lnet
>
>
>
> Hi Parag,
>
>
>
> You may need to confirm that the in-kernel IB and the IB in the kernel
> module "match" (are compatible).  I think that loading the module (`sudo
> modprobe -v ko2iblnd`) may be sufficient to verify the match (it's been a
> while, others may correct me).
>
>
>
> Please indicate which kernel and which IB you are using.
>
>
> Brett
>
> --
>
> Protect yourself against cybercrime
>
> PDS Software Solutions
>
> https://www.TrustPDS.com <https://www.trustpds.com/>
>
>
>
> On Fri, Nov 10, 2017 at 8:30 PM, Parag Khuraswar <para...@citilindia.com>
> wrote:
>
> Hi Keith,
>
>
>
> Below errors I am getting while adding lnet and mounting mdt.
>
>
>
> dmesg logs while adding lnet
>
> =
>
> [317831.432182] LNetError: 28362:0:(api-ni.c:1861:lnet_startup_lndnet())
> Can't load LND o2ib, module ko2iblnd, rc=256
>
> =
>
>
>
>
>
>
>
> dmesg logs while mounting mdt
>
> ==
>
> [290476.172602] LNetError: 23040:0:(api-ni.c:1861:lnet_startup_lndnet())
> Can't load LND o2ib, module ko2iblnd, rc=256
>
> [317478.730515] LDISKFS-fs (dm-1): mounted filesystem with ordered data
> mode. Opts: errors=remount-ro
>
> [317480.166277] LDISKFS-fs (dm-1): mounted filesystem with ordered data
> mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
>
> [317480.313296] LustreError: 28268:0:(ldlm_lib.c:483:client_obd_setup())
> can't add initial connection
>
> [317480.313600] LustreError: 28268:0:(obd_config.c:608:class_setup())
> setup MGC10.2.1.204@o2ib failed (-2)
>
> [317480.313603] LustreError: 28268:0:(obd_mount.c:202:lustre_start_simple())
> MGC10.2.1.204@o2ib setup error -2
>
> [317480.313632] LustreError: 
> 28268:0:(obd_mount_server.c:1573:server_put_super())
> no obd home-MDT
>
> [317480.313635] LustreError: 
> 28268:0:(obd_mount_server.c:132:server_deregister_mount())
> home-MDT not registered
>
&

Re: [lustre-discuss] error while configuring lnet

2017-11-11 Thread Brett Lee
Hi Parag,

You may need to confirm that the in-kernel IB and the IB in the kernel
module "match" (are compatible).  I think that loading the module (`sudo
modprobe -v ko2iblnd`) may be sufficient to verify the match (it's been a
while, others may correct me).

Please indicate which kernel and which IB you are using.

Brett
--
Protect yourself against cybercrime
PDS Software Solutions
https://www.TrustPDS.com 

On Fri, Nov 10, 2017 at 8:30 PM, Parag Khuraswar 
wrote:

> Hi Keith,
>
>
>
> Below errors I am getting while adding lnet and mounting mdt.
>
>
>
> dmesg logs while adding lnet
>
> =
>
> [317831.432182] LNetError: 28362:0:(api-ni.c:1861:lnet_startup_lndnet())
> Can't load LND o2ib, module ko2iblnd, rc=256
>
> =
>
>
>
>
>
>
>
> dmesg logs while mounting mdt
>
> ==
>
> [290476.172602] LNetError: 23040:0:(api-ni.c:1861:lnet_startup_lndnet())
> Can't load LND o2ib, module ko2iblnd, rc=256
>
> [317478.730515] LDISKFS-fs (dm-1): mounted filesystem with ordered data
> mode. Opts: errors=remount-ro
>
> [317480.166277] LDISKFS-fs (dm-1): mounted filesystem with ordered data
> mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
>
> [317480.313296] LustreError: 28268:0:(ldlm_lib.c:483:client_obd_setup())
> can't add initial connection
>
> [317480.313600] LustreError: 28268:0:(obd_config.c:608:class_setup())
> setup MGC10.2.1.204@o2ib failed (-2)
>
> [317480.313603] LustreError: 28268:0:(obd_mount.c:202:lustre_start_simple())
> MGC10.2.1.204@o2ib setup error -2
>
> [317480.313632] LustreError: 
> 28268:0:(obd_mount_server.c:1573:server_put_super())
> no obd home-MDT
>
> [317480.313635] LustreError: 
> 28268:0:(obd_mount_server.c:132:server_deregister_mount())
> home-MDT not registered
>
> [317480.433934] Lustre: server umount home-MDT complete
>
> [317480.433940] LustreError: 28268:0:(obd_mount.c:1504:lustre_fill_super())
> Unable to mount  (-2)
>
> ==
>
>
>
>
>
> Regards,
>
> Parag
>
>
>
>
>
> *From:* Mannthey, Keith [mailto:keith.mannt...@intel.com]
> *Sent:* Saturday, November , 2017 2:06 AM
>
> *To:* Parag Khuraswar; lustre-discuss@lists.lustre.org
> *Subject:* RE: [lustre-discuss] error while configuring lnet
>
>
>
> If you have ib0 device check dmesg for more hints on what is going wrong.
>
>
>
> Thanks,
>
> Keith
>
> *From:* Parag Khuraswar [mailto:para...@citilindia.com]
> *Sent:* Friday, November 10, 2017 10:59 AM
> *To:* Mannthey, Keith ;
> lustre-discuss@lists.lustre.org
> *Subject:* RE: [lustre-discuss] error while configuring lnet
>
>
>
> Hi,
>
>
>
> Basically I am trying to add lnet. Deleting is just try whether it is
> happing or not.
>
> Main is I want to add o2ib network. Which is giving error “invalid
> argument ”
>
> ==
>
> [root@mds2 ~]# lnetctl net add --net o2ib --if ib0
>
> add:
>
> - net:
>
>   errno: -22
>
>   descr: "cannot add network: Invalid argument"
>
> 
>
> I am really not able to understand what argument is invalid in my command.
>
> I am able to ping ib0 network
>
>
>
> Regards,
>
> Parag
>
>
>
>
>
> *From:* Mannthey, Keith [mailto:keith.mannt...@intel.com
> ]
> *Sent:* Friday, November , 2017 10:51 PM
> *To:* Parag Khuraswar; lustre-discuss@lists.lustre.org
> *Subject:* RE: [lustre-discuss] error while configuring lnet
>
>
>
> What are you trying to accomplish?
>
>
>
> From below:
>
>
>
> 10.1.1.205@tcp is on 0@lo not eno1 and in general you should not need the
> “—if” option to delete a fabric.
>
>
>
> Try: # lnetctl net del --net tcp
>
>
>
> Can you do a normal ping over ib0?
>
>
>
> “dmesg” can sometime provide greater details about errors like this.
>
>
>
> Thanks,
>
> Keith
>
>
>
>
>
> *From:* lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org
> ] *On Behalf Of *Parag Khuraswar
> *Sent:* Friday, November 10, 2017 9:10 AM
> *To:* lustre-discuss@lists.lustre.org
> *Subject:* [lustre-discuss] error while configuring lnet
>
>
>
> Hi,
>
>
>
> I am trying to add lnet but getting below error.
>
> ==
>
> [root@mds2 ~]# lnetctl net show
>
> net:
>
> - net type: lo
>
>   local NI(s):
>
> - nid: 0@lo
>
>   status: up
>
> - net type: tcp
>
>   local NI(s):
>
> - nid: 10.1.1.205@tcp
>
>   status: up
>
> [root@mds2 ~]# lnetctl net add --net o2ib --if ib0
>
> add:
>
> - net:
>
>   errno: -22
>
>   descr: "cannot add network: Invalid argument"
>
> [root@mds2 ~]# lnetctl net del --net tcp --if eno1
>
> del:
>
> - net:
>
>   errno: -22
>
>   descr: "cannot del network: Invalid argument"
>
> [root@mds2 ~]# lctl list_nids
>
> 10.1.1.205@tcp
>
> [root@mds2 ~]#
>
> 
>

Re: [lustre-discuss] new client - failover mds: no connection

2017-10-24 Thread Brett Lee
Hi Thomas, nice to see you have remained active in the Lustre community.
To your question, I don't have an answer, but it seems like the timeout may
be masking the root issue - perhaps a system or network issue - I always
start with hostname resolution.  :)
On Oct 24, 2017 11:08 AM, "Thomas Roth"  wrote:

> Sorry to have bothered you - works now.
>
> I have set /sys/fs/lustre/timeout=3000, quite brutally, to make things go
> verrry slowly, and after 25 minutes the mount was there.
>
> Which control aka timeout-parameter _should_ I have tuned instead in such
> a situation?
>
> Regards,
> Thomas
>
> On 10/24/2017 06:26 PM, Thomas Roth wrote:
>
>> Hi all,
>>
>> in a Lustre 2.10, CentOS 7.4 test system, I have a pair of MDS, format
>> command was
>>
>>  > mkfs.lustre --mgs --mdt --fsname=test --index=0
>> --servicenode=10.20.1.198@o2ib5 --servicenode=10.20.1.199@o2ib5
>>  --mgsnode=10.20.1.198@o2ib5 --mgsnode=10.20.1.199@o2ib5
>> /dev/drbd0
>>
>> I added some OSS and clients, everything working.
>>
>> Then I switched off 10.20.1.198 and mounted my MGS/MDT on 10.20.1.199.
>> All OSS and clients connected, everything working.
>>
>> Now I try to add a client that was never there before,
>>  > mount -t lustre 10.20.1.198@o2ib5:10.20.1.199@o2ib5:/test
>> /lustre/test
>>
>> But this client only tries to connect to 10.20.1.198@o2ib5 - and fails.
>> The log says
>>
>> LNet: 47655:0:(o2iblnd_cb.c:2672:kiblnd_check_reconnect())
>> 10.20.1.198@o2ib5: reconnect (invalid service id), 12, 12, msg_size:
>> 4096, queue_depth: 8/-1, max_frags: 256/-1
>> LNet: 47655:0:(o2iblnd_cb.c:2698:kiblnd_rejected()) 10.20.1.198@o2ib5
>> rejected: no listener at 987
>> ...
>> LustreError: 48560:0:(mgc_request.c:251:do_config_log_add())
>> MGC10.20.1.198@o2ib5: failed processing log, type 1: rc = -5
>> LNet: 48427:0:(o2iblnd_cb.c:3207:kiblnd_check_conns()) Timed out tx for
>> 10.20.1.198@o2ib5: 4301501 seconds
>> Lustre: 48441:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request
>> sent has failed due to network error: [sent 1508861258/real 1508861264]
>> req@88103dc78000 x1582155623825424/t0(0) o250->MGC10.20.1.198@o2ib5
>> @10.20.1.198@o2ib5:26/25 lens 520/544 e 0 to 1 dl 1508861408 ref 1 fl
>> Rpc:eXN/0/ rc 0/-1
>>
>>
>> all of which seems logical but not wanted - where is my 10.20.1.199@o2ib5
>> ?
>>
>> Of course I can 'lctl ping 10.20.1.199@o2ib5'.
>> And I have since umounted on one of the older clients, unloaded the
>> Lustre modules, and mounted again - works.
>>
>>
>> Regards,
>> Thomas
>>
>>
> --
> 
> Thomas Roth
> Department: Informationstechnologie
> Location: SB3 1.250
> Phone: +49-6159-71 1453  Fax: +49-6159-71 2986
>
> GSI Helmholtzzentrum für Schwerionenforschung GmbH
> Planckstraße 1
> 64291 Darmstadt
> www.gsi.de
>
> Gesellschaft mit beschränkter Haftung
> Sitz der Gesellschaft: Darmstadt
> Handelsregister: Amtsgericht Darmstadt, HRB 1528
>
> Geschäftsführung: Ursula Weyrich
> Professor Dr. Paolo Giubellino
> Jörg Blaurock
>
> Vorsitzende des Aufsichtsrates: St Dr. Georg Schütte
> Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Cannot mount lustre filesystem anymore

2017-04-25 Thread Brett Lee
"No such file or directory."

Could that be the cause?

Brett
--
Protect Yourself Against Cybercrime
PDS Software Solutions LLC
https://www.TrustPDS.com
On Apr 25, 2017 4:25 AM, "Stefano Turolla" 
wrote:

> Dear all, I am a newbie in lustre, I set up a simple configuration to
> mount a filesystem from a Dell Powervault MD3800i (iscsi + multipath
> enabled)
> It was working properly but, after the last reboot I cannot mount the
> lustre filesystem anymore
> I am running lustre 3.10.0 on scientific linux 7.3.
> I put MDT/MDS on the server
> together with OST
>
> Here is the relevant /etc/fstab
>
> # Lustre MDT / MDS (Manage filenames, directories etc and Block devices
> /dev/sda1   /mnt/lustre-mdt-mds lustre
> noauto,_netdev0 0 /dev/mapper/seqdata
> /mnt/lustre-ost lustre  noauto,_netdev0 0 #
> Lustre Client master-mds@tcp:/seqdata /seq_data
> lustre  noauto,_netdev0 0
> I can mount the /mnt/lustre-mdt-mds  filesystem but not the OST, and of
> course no  client
>
>
> here are the devices
> [root@newmaster lustre]# cat /proc/fs/lustre/devices   0 UP osd-ldiskfs
> seqdata-MDT-osd seqdata-MDT-osd_UUID 9   1 UP mgs MGS MGS 5   2
> UP mgc MGC10.163.85.99@tcp 69e92317-78f6-eef7-1764-57da5aadafe2 5   3 UP
> mds MDS MDS_uuid 3   4 UP lod seqdata-MDT-mdtlov
> seqdata-MDT-mdtlov_UUID 4   5 UP mdt seqdata-MDT
> seqdata-MDT_UUID 5   6 UP mdd seqdata-MDD seqdata-MDD_UUID 4
> 7 UP qmt seqdata-QMT seqdata-QMT_UUID 4   8 UP osp
> seqdata-OST-osc-MDT seqdata-MDT-mdtlov_UUID 5   9 UP lwp
> seqdata-MDT-lwp-MDT seqdata-MDT-lwp-MDT_UUID 5
>
> Here are the errors when I try to mount the OST
>
> [root@newmaster lustre]# mount /mnt/lustre-ost
> Apr 25 12:13:41 newmaster kernel: LDISKFS-fs (dm-0): file extents enabled,
> maximum tree depth=5 Apr 25 12:13:42 newmaster kernel: LDISKFS-fs (dm-0):
> mounted filesystem with ordered data mode. Opts:
> ,errors=remount-ro,no_mbcache,nodelalloc Apr 25 12:13:42 newmaster
> kernel: LustreError: 11242:0:(llog_osd.c:246:llog_osd_read_header())
> seqdata-OST-osd: error reading [0xa:0x14:0x0] log header size 8192: rc
> = -14 Apr 25 12:13:42 newmaster kernel: LustreError:
> 11242:0:(llog_osd.c:246:llog_osd_read_header()) Skipped 1 previous
> similar message Apr 25 12:13:42 newmaster kernel: LustreError:
> 11242:0:(mgc_request.c:1832:mgc_llog_local_copy()) MGC10.163.85.99@tcp:
> failed to copy remote log seqdata-client: rc = -14 Apr 25 12:13:42
> newmaster kernel: LustreError: 13a-8: Failed to get MGS log seqdata-client
> and no local copy. Apr 25 12:13:42 newmaster kernel: LustreError: 15c-8:
> MGC10.163.85.99@tcp: The configuration from log 'seqdata-client' failed
> (-2). This may be the result of communication errors between this node and
> the MGS, a bad configuration, or other errors. See the syslog for more
> information. Apr 25 12:13:42 newmaster kernel: LustreError:
> 11242:0:(obd_mount_server.c:1369:server_start_targets()) seqdata-OST:
> failed to start LWP: -2 Apr 25 12:13:42 newmaster kernel: LustreError:
> 11242:0:(obd_mount_server.c:1844:server_fill_super()) Unable to start
> targets: -2 Apr 25 12:13:42 newmaster kernel: Lustre: Failing over
> seqdata-OST Apr 25 12:13:42 newmaster kernel: Lustre: server umount
> seqdata-OST complete Apr 25 12:13:42 newmaster kernel: LustreError:
> 11242:0:(obd_mount.c:1449:lustre_fill_super()) Unable to mount  (-2) 
> mount.lustre:
> mount /dev/mapper/seqdata at /mnt/lustre-ost failed: No such file or
> directory Is the MGS specification correct? Is the filesystem name
> correct? If upgrading, is the copied client log valid? (see upgrade docs)
>
> Here is the lnet configuration, currently only the server is listed
> [root@newmaster lustre]# cat /etc/modprobe.d/lustre.conf options lnet
> networks=tcp0(eth2)
> I tried to search in the mailing list some good explanation of what
> happened but I could not find any.
> Could you please help me to debug the problem?
> Thanks a lot in advance
> Stefano Turolla
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] client fails to mount

2017-04-24 Thread Brett Lee
So, the LNet ping is not working, and LNet is running on IB.  Have you
moved down the stack toward the hardware, running an ibping from a rebooted
client to the MGS?

Brett
--
Protect Yourself Against Cybercrime
PDS Software Solutions LLC
https://www.TrustPDS.com 

On Mon, Apr 24, 2017 at 11:53 AM, Raj  wrote:

> Yes, this is strange. Normally, I have seen that credits mismatch results
> this scenario but it doesn't look like this is the case.
>
> You wouldn't want to put mgs into capture debug messages as there will be
> a lot of data.
>
> I guess you already tried removing the lustre drivers and adding it again
> ?
> lustre_rmmod
> modprobe -v lustre
>
> And check dmesg for any errors...
>
>
> On Mon, Apr 24, 2017 at 12:43 PM Strikwerda, Ger 
> wrote:
>
>> Hi Raj,
>>
>> When i do a lctl ping on a MGS server i do not see any logs at all. Also
>> not when i do a sucessfull ping from a working node. Is there a way to
>> verbose the Lustre logging to see more detail on the LNET level?
>>
>> It is very strange that a rebooted node is able to lctl ping compute
>> nodes, but fails to lctl ping metadata and storage nodes.
>>
>>
>>
>>
>> On Mon, Apr 24, 2017 at 7:35 PM, Raj  wrote:
>>
>>> Ger,
>>> It looks like default configuration of lustre.
>>>
>>> Do you see any error message on the MGS side while you are doing lctl
>>> ping from the rebooted clients?
>>> On Mon, Apr 24, 2017 at 12:27 PM Strikwerda, Ger <
>>> g.j.c.strikwe...@rug.nl> wrote:
>>>
 Hi Eli,

 Nothing can be mounted on the Lustre filesystems so the output is:

 [root@pg-gpu01 ~]# lfs df /home/ger/
 [root@pg-gpu01 ~]#

 Empty..



 On Mon, Apr 24, 2017 at 7:24 PM, E.S. Rosenberg 
 wrote:

>
>
> On Mon, Apr 24, 2017 at 8:19 PM, Strikwerda, Ger <
> g.j.c.strikwe...@rug.nl> wrote:
>
>> Hallo Eli,
>>
>> Logfile/syslog on the client-side:
>>
>> Lustre: Lustre: Build Version: 2.5.3-RC1--PRISTINE-2.6.32-
>> 573.el6.x86_64
>> LNet: Added LNI 172.23.54.51@o2ib [8/256/0/180]
>> LNetError: 2878:0:(o2iblnd_cb.c:2587:kiblnd_rejected())
>> 172.23.55.211@o2ib rejected: consumer defined fatal error
>>
>
> lctl df /path/to/some/file
>
> gives nothing useful? (the second one will dump *a lot*)
>
>>
>>
>>
>>
>> On Mon, Apr 24, 2017 at 7:16 PM, E.S. Rosenberg <
>> esr+lus...@mail.hebrew.edu> wrote:
>>
>>>
>>>
>>> On Mon, Apr 24, 2017 at 8:13 PM, Strikwerda, Ger <
>>> g.j.c.strikwe...@rug.nl> wrote:
>>>
 Hi Raj (and others),

 In which file should i state the credits/peer_credits stuff?

 Perhaps relevant config-files:

 [root@pg-gpu01 ~]# cd /etc/modprobe.d/

 [root@pg-gpu01 modprobe.d]# ls
 anaconda.conf   blacklist-kvm.conf  dist-alsa.conf
 dist-oss.conf   ib_ipoib.conf  lustre.conf  openfwwf.conf
 blacklist.conf  blacklist-nouveau.conf  dist.conf
 freeipmi-modalias.conf  ib_sdp.confmlnx.conftruescale.conf

 [root@pg-gpu01 modprobe.d]# cat ./ib_ipoib.conf
 alias netdev-ib* ib_ipoib

 [root@pg-gpu01 modprobe.d]# cat ./mlnx.conf
 # Module parameters for MLNX_OFED kernel modules

 [root@pg-gpu01 modprobe.d]# cat ./lustre.conf
 options lnet networks=o2ib(ib0)

 Are there more Lustre/LNET options that could help in this
 situation?

>>>
>>> What about the logfiles?
>>> Any error messages in syslog? lctl debug options?
>>> Veel geluk,
>>> Eli
>>>




 On Mon, Apr 24, 2017 at 7:02 PM, Raj  wrote:

> May be worth checking your lnet credits and peer_credits in
> /etc/modprobe.d ?
> You can compare between working hosts and non working hosts.
> Thanks
> _Raj
>
> On Mon, Apr 24, 2017 at 10:10 AM Strikwerda, Ger <
> g.j.c.strikwe...@rug.nl> wrote:
>
>> Hi Rick,
>>
>> Even without iptables rules and loading the correct modules
>> afterwards, we get the same results:
>>
>> [root@pg-gpu01 sysconfig]# iptables --list
>> Chain INPUT (policy ACCEPT)
>> target prot opt source   destination
>>
>> Chain FORWARD (policy ACCEPT)
>> target prot opt source   destination
>>
>> Chain OUTPUT (policy ACCEPT)
>> target prot opt source   destination
>>
>> Chain LOGDROP (0 references)
>> target prot opt source   destination
>> LOGall  --  anywhere anywhereLOG
>> level warning

Re: [lustre-discuss] Backup software for Lustre

2017-03-23 Thread Brett Lee
> I didn't see anything in your strace logs that would indicate the lustre
> xattrs were "restored",
> rather that they were "created" (as they always are) when the files or
> directories are created.
>
>
Agreed.  The strace output I provided was for the "tar zxvf /scratch.tgz"
command that did not extract the xattrs.  As you wrote, in that scenario
the xattrs were created when the files and directories were
extracted/created.


"trusted.lov"   - holds layout for a regular file, or the directory default
> layout (== lustre.lov)
> "trusted.lma"   - holds FID for current file, some flags
> "trusted.lmv"   - holds layout for a striped directory (DNE 2), not
> present otherwise
> "trusted.link"  - holds parent directory FID+filename for each link to a
> file (for lfs fid2path)
>
>
Andreas, time and time again, you are always such a big help in
understanding Lustre in depth.  Thank you very much for putting together
this list of xattrs!  Its nice to have a better understanding of how many
exist, what they hold, and their names.  I hope to see them soon in the
Lustre manual - LUDOC-373 <https://jira.hpdd.intel.com/browse/LUDOC-373>

Regards,
Brett


> Cheers, Andreas
>
> > On Mon, Mar 20, 2017 at 1:51 AM, Dilger, Andreas <
> andreas.dil...@intel.com> wrote:
> >> On Mar 19, 2017, at 20:57, Brett Lee <brettlee.lus...@gmail.com> wrote:
> >>>
> >>> Help from down under! :)  Thanks Malcolm!!
> >>>
> >>> Using the correct extract syntax, both directories and files restored
> with the expected stripe size and stripe count.  This means that Zmanda
> should be able to work as a backup solution for some Lustre file systems,
> especially if able to using multiple per-directory "agents" to perform
> backups in parallel.
> >>
> >> I tested this on RHEL7, and it seems tar needs a bit of encouragement
> to restore Lustre xattrs:
> >>
> >>tar xvf  --xattrs-include="lustre.*"
> >>
> >> This will restore the important "lustre.lov" xattr with the file layout
> (Lustre itself discards the other "lustre.*" xattrs silently) for regular
> files via mknod+setxattr.  It still doesn't restore the default lustre.lov
> xattr onto the directory, even though it backed those xattrs up.
> >>
> >> Unfortunately, I didn't see the tar process even trying to access any
> config file like /etc/xattr.conf or /etc/tar.conf whatever, so I don't
> think there is any way to make this the default without recompiling tar.
> Sadly, I thought we'd gotten away from this once RHEL6 had patched their
> tar and GNU tar had added xattr support natively in 1.27, but it seems that
> we'll need to patch it again.  One option would be to add "lustre.*" to the
> xattrs-include list if extracting to a Lustre filesystem (e.g. checked via
> statfs() at the start).
> >>
> >> Cheers, Andreas
> >>
> >>> Go Lustre!
> >>>
> >>> Folks, my bad for only checking one source - the 3rd hit via Google
> for "backup restore xattrs tar":
> >>> https://www.cyberciti.biz/faq/linux-tar-rsync-preserving-
> acls-selinux-contexts/
> >>> I wasn't aware that "tar" was so dynamic in this respect.  Alas, this
> again justifyies the adage "trust, but verify" - and especially with
> backups.
> >>>
> >>> Brett
> >>> --
> >>> Protect Yourself Against Cybercrime
> >>> PDS Software Solutions LLC
> >>> https://www.TrustPDS.com
> >>>
> >>> On Sun, Mar 19, 2017 at 3:34 PM, Cowe, Malcolm J <
> malcolm.j.c...@intel.com> wrote:
> >>> The version of tar included in RHEL 7 doesn’t restore the lustre
> xattrs by default – you can use the following to extract files with the
> requisite xattrs:
> >>>
> >>>
> >>>
> >>> tar --xattrs-include=lustre.* -xf .tar
> >>>
> >>>
> >>>
> >>> This assumes the files were backed up with the --xattrs flag:
> >>>
> >>>
> >>>
> >>> tar --xattrs -cf .tar 
> >>>
> >>>
> >>>
> >>> Note, that you don’t appear to need to whitelist the Lustre xattrs
> when backing up, only when restoring.
> >>>
> >>>
> >>>
> >>> Malcolm.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on
> behalf of "Dil

Re: [lustre-discuss] lfs_migrate

2017-03-20 Thread Brett Lee
Hi Eli,

I believe that is still the case with lfs_migrate.  If otherwise, we'll
probably hear soon.

You should be able to disable those OSTs while keeping the file system
active - via a command on the MDS(s) as well as the clients.  My notes have
the command as shown below, but please confirm via the appropriate Lustre
manual:

lctl set_param osc.--*.active=0

Brett
--
Protect Yourself Against Cybercrime
PDS Software Solutions LLC
https://www.TrustPDS.com 

On Mon, Mar 20, 2017 at 10:43 AM, E.S. Rosenberg  wrote:

> In the man page it says the following:
>
> Because  lfs_migrate  is  not yet closely integrated with the MDS, it
> cannot determine whether a file is currently open and/or in-use by other
> applications or nodes.  That makes it UNSAFE for use on files that might be
> modified by other applications, since the migrated file is only a copy of
> the current file, and this will result in the old file becoming an
> open-unlinked file and any  modifications to that file will be lost.
>
> Is this still the case?
> Is there a better way to disable OSTs while keeping the filesystem live?
>
> Background:
> We need to take a OSS enclosure that hosts multiple OSTs offline for
> hardware maintenance, I'd like to do this without bringing Lustre as a
> whole down. I made sure there is enough space on the other OSTs to house
> the contents of the machine going offline and am now about to move things.
>
> Thanks,
> Eli
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Backup software for Lustre

2017-03-20 Thread Brett Lee
Ah, yes.  Thanks for correcting my statement, Andreas.  I'd like to retract
my last email - was probably too happy seeing the striping patterns return
after extracting the tar file. ;)

Andreas, Related to your statement about backing up but not restoring the
lustre.lov for directories, I'm curious what accounts for (in my tests,
anyway) the directories being restored with their non-default striping
patterns in-tact?  I'm not aware of all the Lustre xattrs and what they
provide. Thanks.

Brett
--
Protect Yourself Against Cybercrime
PDS Software Solutions LLC
https://www.TrustPDS.com <https://www.trustpds.com/>

On Mon, Mar 20, 2017 at 1:51 AM, Dilger, Andreas <andreas.dil...@intel.com>
wrote:

> On Mar 19, 2017, at 20:57, Brett Lee <brettlee.lus...@gmail.com> wrote:
> >
> > Help from down under! :)  Thanks Malcolm!!
> >
> > Using the correct extract syntax, both directories and files restored
> with the expected stripe size and stripe count.  This means that Zmanda
> should be able to work as a backup solution for some Lustre file systems,
> especially if able to using multiple per-directory "agents" to perform
> backups in parallel.
>
> I tested this on RHEL7, and it seems tar needs a bit of encouragement to
> restore Lustre xattrs:
>
> tar xvf  --xattrs-include="lustre.*"
>
> This will restore the important "lustre.lov" xattr with the file layout
> (Lustre itself discards the other "lustre.*" xattrs silently) for regular
> files via mknod+setxattr.  It still doesn't restore the default lustre.lov
> xattr onto the directory, even though it backed those xattrs up.
>
> Unfortunately, I didn't see the tar process even trying to access any
> config file like /etc/xattr.conf or /etc/tar.conf whatever, so I don't
> think there is any way to make this the default without recompiling tar.
> Sadly, I thought we'd gotten away from this once RHEL6 had patched their
> tar and GNU tar had added xattr support natively in 1.27, but it seems that
> we'll need to patch it again.  One option would be to add "lustre.*" to the
> xattrs-include list if extracting to a Lustre filesystem (e.g. checked via
> statfs() at the start).
>
> Cheers, Andreas
>
> > Go Lustre!
> >
> > Folks, my bad for only checking one source - the 3rd hit via Google for
> "backup restore xattrs tar":
> > https://www.cyberciti.biz/faq/linux-tar-rsync-preserving-acl
> s-selinux-contexts/
> > I wasn't aware that "tar" was so dynamic in this respect.  Alas, this
> again justifyies the adage "trust, but verify" - and especially with
> backups.
> >
> > Brett
> > --
> > Protect Yourself Against Cybercrime
> > PDS Software Solutions LLC
> > https://www.TrustPDS.com
> >
> > On Sun, Mar 19, 2017 at 3:34 PM, Cowe, Malcolm J <
> malcolm.j.c...@intel.com> wrote:
> > The version of tar included in RHEL 7 doesn’t restore the lustre xattrs
> by default – you can use the following to extract files with the requisite
> xattrs:
> >
> >
> >
> > tar --xattrs-include=lustre.* -xf .tar
> >
> >
> >
> > This assumes the files were backed up with the --xattrs flag:
> >
> >
> >
> > tar --xattrs -cf .tar 
> >
> >
> >
> > Note, that you don’t appear to need to whitelist the Lustre xattrs when
> backing up, only when restoring.
> >
> >
> >
> > Malcolm.
> >
> >
> >
> >
> >
> > From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on
> behalf of "Dilger, Andreas" <andreas.dil...@intel.com>
> > Date: Monday, 20 March 2017 at 8:11 am
> > To: Brett Lee <brettlee.lus...@gmail.com>
> > Cc: Lustre discussion <lustre-discuss@lists.lustre.org>
> >
> >
> > Subject: Re: [lustre-discuss] Backup software for Lustre
> >
> >
> >
> > The use of openat() could be problematic since this precludes storing
> the xattr before the file is opened. That said, I don't see anywhere in
> your strace log that (f)setxattr() is called to restore the xattrs, for
> either the regular files or directories, even after the file is opened or
> written?
> >
> >
> >
> > Does the RHEL tar have a whitelist of xattrs to be restored?  The fact
> that there are Lustre xattrs after the restore appears to just be normal
> behavior for creating a file, not anything related to tar restoring xattrs.
> >
> > Cheers, Andreas
> >
> >
> >
> > On Mar 19, 2017, at 10:45, Brett Lee <brettlee.lus...@gmail.com> wrote:
> >
> > Sure, happy to help.  I did not see mkn

Re: [lustre-discuss] Backup software for Lustre

2017-03-19 Thread Brett Lee
Help from down under! :)  Thanks Malcolm!!

Using the correct extract syntax, both directories and files restored with
the expected stripe size and stripe count.  This means that Zmanda should
be able to work as a backup solution for some Lustre file systems,
especially if able to using multiple per-directory "agents" to perform
backups in parallel.

Go Lustre!

Folks, my bad for only checking one source - the 3rd hit via Google for
"backup restore xattrs tar":
https://www.cyberciti.biz/faq/linux-tar-rsync-preserving-acls-selinux-contexts/
I wasn't aware that "tar" was so dynamic in this respect.  Alas, this again
justifyies the adage "trust, but verify" - and especially with backups.

Brett
--
Protect Yourself Against Cybercrime
PDS Software Solutions LLC
https://www.TrustPDS.com <https://www.trustpds.com/>

On Sun, Mar 19, 2017 at 3:34 PM, Cowe, Malcolm J <malcolm.j.c...@intel.com>
wrote:

> The version of tar included in RHEL 7 doesn’t restore the lustre xattrs by
> default – you can use the following to extract files with the requisite
> xattrs:
>
>
>
> tar --xattrs-include=lustre.* -xf .tar
>
>
>
> This assumes the files were backed up with the --xattrs flag:
>
>
>
> tar --xattrs -cf .tar 
>
>
>
> Note, that you don’t appear to need to whitelist the Lustre xattrs when
> backing up, only when restoring.
>
>
>
> Malcolm.
>
>
>
>
>
> *From: *lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on
> behalf of "Dilger, Andreas" <andreas.dil...@intel.com>
> *Date: *Monday, 20 March 2017 at 8:11 am
> *To: *Brett Lee <brettlee.lus...@gmail.com>
> *Cc: *Lustre discussion <lustre-discuss@lists.lustre.org>
>
> *Subject: *Re: [lustre-discuss] Backup software for Lustre
>
>
>
> The use of openat() could be problematic since this precludes storing the
> xattr before the file is opened. That said, I don't see anywhere in your
> strace log that (f)setxattr() is called to restore the xattrs, for either
> the regular files or directories, even after the file is opened or written?
>
>
>
>
> Does the RHEL tar have a whitelist of xattrs to be restored?  The fact
> that there are Lustre xattrs after the restore appears to just be normal
> behavior for creating a file, not anything related to tar restoring xattrs.
>
> Cheers, Andreas
>
>
>
> On Mar 19, 2017, at 10:45, Brett Lee <brettlee.lus...@gmail.com> wrote:
>
> Sure, happy to help.  I did not see mknod+setxattr in the strace output.
> Included is a trimmed version of the strace output, along with a few more
> bits of information.  Thanks!
>
> # cat /proc/fs/lustre/version
> lustre: 2.7.19.8
> # cat /etc/redhat-release
> CentOS Linux release 7.3.1611 (Core)
> # uname -r
> 3.10.0-514.2.2.el7_lustre.x86_64
> # rpm -qa|grep tar
> tar-1.26-31.el7.x86_64
> # sha1sum `which tar` `which gtar`
> ea17ec98894212b2e2285eb2dd99aad76185ea7d  /usr/bin/tar
> ea17ec98894212b2e2285eb2dd99aad76185ea7d  /usr/bin/gtar
>
> Striping was set on the four directories before creating the files.
> mkdir -p /scratch/1; lfs setstripe -c 1 --stripe-size 128K /scratch/1; lfs
> getstripe /scratch/1
> mkdir -p /scratch/2; lfs setstripe -c 2 --stripe-size 256K /scratch/2; lfs
> getstripe /scratch/2
> mkdir -p /scratch/3; lfs setstripe -c 3 --stripe-size 768K /scratch/3; lfs
> getstripe /scratch/3
> mkdir -p /scratch/4; lfs setstripe -c 4 --stripe-size 1M/scratch/4;
> lfs getstripe /scratch/4
> After tar, all files and directories all had the default Lustre striping.
>
> # tar ztvf /scratch.tgz
> drwxr-xr-x root/root 0 2017-03-19 10:54 scratch/
> drwxr-xr-x root/root 0 2017-03-19 10:57 scratch/4/
> -rw-r--r-- root/root   4194304 2017-03-19 10:57 scratch/4/4.dd
> drwxr-xr-x root/root 0 2017-03-19 10:57 scratch/3/
> -rw-r--r-- root/root   4194304 2017-03-19 10:57 scratch/3/3.dd
> drwxr-xr-x root/root 0 2017-03-19 10:57 scratch/1/
> -rw-r--r-- root/root   4194304 2017-03-19 10:57 scratch/1/1.dd
> drwxr-xr-x root/root 0 2017-03-19 10:57 scratch/2/
> -rw-r--r-- root/root   4194304 2017-03-19 10:57 scratch/2/2.dd
>
> # strace tar zxvf /scratch.tgz > strace.out 2>&1
> execve("/usr/bin/tar", ["tar", "zxvf", "/scratch.tgz"], [/* 22 vars */]) =
> 0
> ...
> (-cut - loading libraries)
> ...
> fstat(1, {st_mode=S_IFREG|0644, st_size=10187, ...}) = 0
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
> 0x7f4a63d9f000
> write(1, "scratch/\n", 9scratch/
> )   = 9
> mkdirat(AT_FDCWD, "scratch", 0700)  = -1 EEXIST (File exists)
> newfstatat(AT_FDCWD, &quo

Re: [lustre-discuss] Backup software for Lustre

2017-03-19 Thread Brett Lee
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
10240) = 10240
read(3,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
10240) = 10240
write(4,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
10240) = 10240
read(3,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
10240) = 10240
write(4,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
10240) = 10240
read(3,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
10240) = 10240
write(4,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
7680) = 7680
dup2(4, 4)  = 4
fstat(4, {st_mode=S_IFREG|0600, st_size=4194304, ...}) = 0
utimensat(4, NULL, {{1489935825, 0}, {1489935432, 0}}, 0) = 0
fchown(4, 0, 0) = 0
fchmod(4, 0644) = 0
close(4)= 0
clock_gettime(CLOCK_REALTIME, {1489935825, 628399394}) = 0
clock_gettime(CLOCK_REALTIME, {1489935825, 628414336}) = 0
close(3)= 0
wait4(2476, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 2476
newfstatat(AT_FDCWD, "scratch/2", {st_mode=S_IFDIR|0700, st_size=4096,
...}, AT_SYMLINK_NOFOLLOW) = 0
utimensat(AT_FDCWD, "scratch/2", {{1489935825, 0}, {1489935432, 0}},
AT_SYMLINK_NOFOLLOW) = 0
fchownat(AT_FDCWD, "scratch/2", 0, 0, AT_SYMLINK_NOFOLLOW) = 0
fchmodat(AT_FDCWD, "scratch/2", 0755)   = 0
newfstatat(AT_FDCWD, "scratch", {st_mode=S_IFDIR|0755, st_size=4096, ...},
0) = 0
utimensat(AT_FDCWD, "scratch", {{1489934977, 0}, {1489935261, 0}}, 0) = 0
fchownat(AT_FDCWD, "scratch", 0, 0, 0)  = 0
close(1)= 0
munmap(0x7f4a63d9f000, 4096)= 0
close(2)= 0
exit_group(0)   = ?
+++ exited with 0 +++


Brett
--
Protect Yourself Against Cybercrime
PDS Software Solutions LLC
https://www.TrustPDS.com <https://www.trustpds.com/>

On Sun, Mar 19, 2017 at 7:39 AM, Dilger, Andreas <andreas.dil...@intel.com>
wrote:

> I ran a test locally with RHEL 6.8 and the included tar 1.26 using strace,
> and tar is properly using mknod+setxattr to restore the "lov" xattr, and
> the stripe count and stripe size to be preserved.
>
> The OST index is not preserved with the xattr restore, since that may
> cause imbalance if the  files were backed up in a different filesystem
> (e.g. one with fewer OSTs).  The MDS will balance OST allocation as needed
> for the current OST usage.
>
> Could you please run your tar on RHEL 7 with strace to see if it is doing
> this correctly?
>
> Cheers, Andreas
>
> On Mar 18, 2017, at 21:51, Brett Lee <brettlee.lus...@gmail.com> wrote:
>
> Hi Andreas, I expected that to be the case, but found out it was not.
> Instead, the restore restores everything - unless directed otherwise.
>
> Backup == cmd + add xattrs.
> Restore == cmd + exclude xattrs.
>
> Brett
> --
> Protect Yourself Against Cybercrime
> PDS Software Solutions LLC
> https://www.TrustPDS.com
> On Mar 18, 2017 9:28 PM, "Dilger, Andreas" <andreas.dil...@intel.com>
> wrote:
>
>> Do you need to specify --xattrs (or similar) during the restore phase as
>> well?
>>
>> Cheers, Andreas
>>
>> On Mar 17, 2017, at 15:12, Brett Lee <brettlee.lus...@gmail.com> wrote:
>>
>> Hi.  In what I thought was a valid test, I was unable to confirm that a
>> backup and restore retained the layouts.  Perhaps my expectation or process
>> was incorrect?  The process was:
>>
>> 1.  Create 4 files, each with different stripe sizes and stripe counts
>> (verified with getstripe).
>> 2.  Back up the files using tar-1.26-31.el7.x86_64.
>> 3.  Recreate a file system and restore the files.
>>
>> Backup command:  tar --xattrs -zcvf /scratch.tgz /scratch
>> Restore command:  tar zxvf /scratch.tgz
>>
>> After restoration, getstripe showed that each file had the default stripe
>> count (1) and stripe size (1MB).
>> FWIW:  After restoring, getfattr produced the same result for each file:
>> # getfattr -d -m - -R 
>> lustre.lov=0s0AvRCwEdAAAEAAACAAAQAAEFAAA
>> =
>> trusted.link=0s3/HqEQEuABYCAAAEA
>> AUAMS5kZA==
>> trusted.lma=0sBgAAAB0A
>> trusted.lov=0s0AvRCwEdAAAEAAACAAAQAAEFAA
>> A=
>>
>> Brett
>> --
>> Protect Yourself Against Cybercrime
>> PDS Software Solutions LLC
>> https://www.

Re: [lustre-discuss] Backup software for Lustre

2017-03-18 Thread Brett Lee
Hi Andreas, I expected that to be the case, but found out it was not.
Instead, the restore restores everything - unless directed otherwise.

Backup == cmd + add xattrs.
Restore == cmd + exclude xattrs.

Brett
--
Protect Yourself Against Cybercrime
PDS Software Solutions LLC
https://www.TrustPDS.com
On Mar 18, 2017 9:28 PM, "Dilger, Andreas" <andreas.dil...@intel.com> wrote:

> Do you need to specify --xattrs (or similar) during the restore phase as
> well?
>
> Cheers, Andreas
>
> On Mar 17, 2017, at 15:12, Brett Lee <brettlee.lus...@gmail.com> wrote:
>
> Hi.  In what I thought was a valid test, I was unable to confirm that a
> backup and restore retained the layouts.  Perhaps my expectation or process
> was incorrect?  The process was:
>
> 1.  Create 4 files, each with different stripe sizes and stripe counts
> (verified with getstripe).
> 2.  Back up the files using tar-1.26-31.el7.x86_64.
> 3.  Recreate a file system and restore the files.
>
> Backup command:  tar --xattrs -zcvf /scratch.tgz /scratch
> Restore command:  tar zxvf /scratch.tgz
>
> After restoration, getstripe showed that each file had the default stripe
> count (1) and stripe size (1MB).
> FWIW:  After restoring, getfattr produced the same result for each file:
> # getfattr -d -m - -R 
> lustre.lov=0s0AvRCwEdAAAEAAACAAAQAAEFAA
> A=
> trusted.link=0s3/HqEQEu
> ABYCAAAEAAUAMS5kZA==
> trusted.lma=0sBgAAAB0A
> trusted.lov=0s0AvRCwEdAAAEAAACAAAQAAEFAA
> A=
>
> Brett
> --
> Protect Yourself Against Cybercrime
> PDS Software Solutions LLC
> https://www.TrustPDS.com <https://www.trustpds.com/>
>
> On Wed, Mar 15, 2017 at 5:03 AM, Dilger, Andreas <andreas.dil...@intel.com
> > wrote:
>
>> I believe Zmanda is already using GNU tar (or RHEL tar) for the actual
>> backup storage?  I that case it should already work, since we fixed tar
>> long ago to backup and restore xattrs in a way that preserves Lustre
>> layouts.
>>
>> Cheers, Andreas
>>
>> On Mar 14, 2017, at 15:47, Brett Lee <brettlee.lus...@gmail.com> wrote:
>>
>> Thanks for the details, Andreas!
>>
>> Maybe OpenSFS can fund Zmanda so that their backup software can include
>> the Lustre metadata... :)
>>
>> Brett
>> --
>> Protect Yourself Against Cybercrime
>> PDS Software Solutions LLC
>> https://www.TrustPDS.com <https://www.trustpds.com/>
>>
>> On Tue, Mar 14, 2017 at 3:13 PM, Dilger, Andreas <
>> andreas.dil...@intel.com> wrote:
>>
>>> To reply to this old thread, there are two different kinds of Lustre
>>> backup solutions:
>>> - file level backups that traverse the client POSIX filesystem, for
>>> which any number of
>>>   commercial solutions exist.  Making these solutions "capable of saving
>>> Lustre metadata"
>>>   boils down to two simple things - save the "lustre.lov" xattr during
>>> backup (at a minimum,
>>>   other xattrs also should be backed up), and then using mknod(2) +
>>> setxattr() to restore
>>>   the "lustre.lov" xattr before opening the file and restoring the data.
>>>
>>> - device level backups (e.g. "dd" for ldiskfs, and "zfs send/recv" for
>>> ZFS).
>>>
>>> Using the file level backups allows backup/restore of subsets of the
>>> filesystem, since many
>>> HPC sites have Lustre filesystems that are too large to backup
>>> completely.  I typically do
>>> not recommend to use device-level backups for the OSTs, unless doing an
>>> OST hardware migration,
>>> and even then it is probably less disruptive to do Lustre-level file
>>> migration off the OST
>>> before swapping it out.
>>>
>>> Whether file level backups are used or not, I would recommend sites
>>> always make periodic
>>> device level backups of the MDT(s).  The amount of space needed for an
>>> MDT backup is small
>>> compared to the rest of the filesystem (e.g. a few TB at most), and can
>>> avoid the need for
>>> a full filesystem restore (e.g. multi-PB of data, if a full backup
>>> exists at all) even
>>> though all the data is still available on the OSTs.
>>>
>>> The MDT device-level backup can use relatively slow SATA drives, since
>>> they will mostly be
>>> used for linear writes (or occasionally linear reads for restore), so a
>>> few multi-T

Re: [lustre-discuss] Backup software for Lustre

2017-03-17 Thread Brett Lee
Hi.  In what I thought was a valid test, I was unable to confirm that a
backup and restore retained the layouts.  Perhaps my expectation or process
was incorrect?  The process was:

1.  Create 4 files, each with different stripe sizes and stripe counts
(verified with getstripe).
2.  Back up the files using tar-1.26-31.el7.x86_64.
3.  Recreate a file system and restore the files.

Backup command:  tar --xattrs -zcvf /scratch.tgz /scratch
Restore command:  tar zxvf /scratch.tgz

After restoration, getstripe showed that each file had the default stripe
count (1) and stripe size (1MB).
FWIW:  After restoring, getfattr produced the same result for each file:
# getfattr -d -m - -R 
lustre.lov=0s0AvRCwEdAAAEAAACAAAQAAEFAAA=
trusted.link=0s3/HqEQEuABYCAAAEAAUAMS5kZA==
trusted.lma=0sBgAAAB0A
trusted.lov=0s0AvRCwEdAAAEAAACAAAQAAEFAAA=

Brett
--
Protect Yourself Against Cybercrime
PDS Software Solutions LLC
https://www.TrustPDS.com <https://www.trustpds.com/>

On Wed, Mar 15, 2017 at 5:03 AM, Dilger, Andreas <andreas.dil...@intel.com>
wrote:

> I believe Zmanda is already using GNU tar (or RHEL tar) for the actual
> backup storage?  I that case it should already work, since we fixed tar
> long ago to backup and restore xattrs in a way that preserves Lustre
> layouts.
>
> Cheers, Andreas
>
> On Mar 14, 2017, at 15:47, Brett Lee <brettlee.lus...@gmail.com> wrote:
>
> Thanks for the details, Andreas!
>
> Maybe OpenSFS can fund Zmanda so that their backup software can include
> the Lustre metadata... :)
>
> Brett
> --
> Protect Yourself Against Cybercrime
> PDS Software Solutions LLC
> https://www.TrustPDS.com <https://www.trustpds.com/>
>
> On Tue, Mar 14, 2017 at 3:13 PM, Dilger, Andreas <andreas.dil...@intel.com
> > wrote:
>
>> To reply to this old thread, there are two different kinds of Lustre
>> backup solutions:
>> - file level backups that traverse the client POSIX filesystem, for which
>> any number of
>>   commercial solutions exist.  Making these solutions "capable of saving
>> Lustre metadata"
>>   boils down to two simple things - save the "lustre.lov" xattr during
>> backup (at a minimum,
>>   other xattrs also should be backed up), and then using mknod(2) +
>> setxattr() to restore
>>   the "lustre.lov" xattr before opening the file and restoring the data.
>>
>> - device level backups (e.g. "dd" for ldiskfs, and "zfs send/recv" for
>> ZFS).
>>
>> Using the file level backups allows backup/restore of subsets of the
>> filesystem, since many
>> HPC sites have Lustre filesystems that are too large to backup
>> completely.  I typically do
>> not recommend to use device-level backups for the OSTs, unless doing an
>> OST hardware migration,
>> and even then it is probably less disruptive to do Lustre-level file
>> migration off the OST
>> before swapping it out.
>>
>> Whether file level backups are used or not, I would recommend sites
>> always make periodic
>> device level backups of the MDT(s).  The amount of space needed for an
>> MDT backup is small
>> compared to the rest of the filesystem (e.g. a few TB at most), and can
>> avoid the need for
>> a full filesystem restore (e.g. multi-PB of data, if a full backup exists
>> at all) even
>> though all the data is still available on the OSTs.
>>
>> The MDT device-level backup can use relatively slow SATA drives, since
>> they will mostly be
>> used for linear writes (or occasionally linear reads for restore), so a
>> few multi-TB SATA III
>> drives is sufficient for storing a rotating set of MDT device backups.
>> At 150MB/s for even
>> a single SATA drive, this is about 2h/TB, which is reasonable to do once
>> a week (or more often
>> for smaller MDTs).
>>
>> While using an LVM snapshot of the ldiskfs MDT for the backup source is
>> desirable for consistency
>> reasons, having even an MDT backup from a mounted and in-use MDT is
>> better than nothing at
>> all when a problem is hit, since e2fsck can repair the in-use
>> inconsistencies fairly easily,
>> and Lustre can deal with inconsistencies between the MDT and OST
>> reasonably (at most returning
>> an -ENOENT error to the client for files that were deleted).
>>
>> Cheers, Andreas
>>
>> On Feb 7, 2017, at 12:32, Andrew Holway <andrew.hol...@gmail.com> wrote:
>> >
>> > Would it be difficult to suspend IO and snapshot all the nodes
>> 

Re: [lustre-discuss] Backup software for Lustre

2017-03-14 Thread Brett Lee
Thanks for the details, Andreas!

Maybe OpenSFS can fund Zmanda so that their backup software can include the
Lustre metadata... :)

Brett
--
Protect Yourself Against Cybercrime
PDS Software Solutions LLC
https://www.TrustPDS.com <https://www.trustpds.com/>

On Tue, Mar 14, 2017 at 3:13 PM, Dilger, Andreas <andreas.dil...@intel.com>
wrote:

> To reply to this old thread, there are two different kinds of Lustre
> backup solutions:
> - file level backups that traverse the client POSIX filesystem, for which
> any number of
>   commercial solutions exist.  Making these solutions "capable of saving
> Lustre metadata"
>   boils down to two simple things - save the "lustre.lov" xattr during
> backup (at a minimum,
>   other xattrs also should be backed up), and then using mknod(2) +
> setxattr() to restore
>   the "lustre.lov" xattr before opening the file and restoring the data.
>
> - device level backups (e.g. "dd" for ldiskfs, and "zfs send/recv" for
> ZFS).
>
> Using the file level backups allows backup/restore of subsets of the
> filesystem, since many
> HPC sites have Lustre filesystems that are too large to backup
> completely.  I typically do
> not recommend to use device-level backups for the OSTs, unless doing an
> OST hardware migration,
> and even then it is probably less disruptive to do Lustre-level file
> migration off the OST
> before swapping it out.
>
> Whether file level backups are used or not, I would recommend sites always
> make periodic
> device level backups of the MDT(s).  The amount of space needed for an MDT
> backup is small
> compared to the rest of the filesystem (e.g. a few TB at most), and can
> avoid the need for
> a full filesystem restore (e.g. multi-PB of data, if a full backup exists
> at all) even
> though all the data is still available on the OSTs.
>
> The MDT device-level backup can use relatively slow SATA drives, since
> they will mostly be
> used for linear writes (or occasionally linear reads for restore), so a
> few multi-TB SATA III
> drives is sufficient for storing a rotating set of MDT device backups.  At
> 150MB/s for even
> a single SATA drive, this is about 2h/TB, which is reasonable to do once a
> week (or more often
> for smaller MDTs).
>
> While using an LVM snapshot of the ldiskfs MDT for the backup source is
> desirable for consistency
> reasons, having even an MDT backup from a mounted and in-use MDT is better
> than nothing at
> all when a problem is hit, since e2fsck can repair the in-use
> inconsistencies fairly easily,
> and Lustre can deal with inconsistencies between the MDT and OST
> reasonably (at most returning
> an -ENOENT error to the client for files that were deleted).
>
> Cheers, Andreas
>
> On Feb 7, 2017, at 12:32, Andrew Holway <andrew.hol...@gmail.com> wrote:
> >
> > Would it be difficult to suspend IO and snapshot all the nodes (assuming
> ZFS). Could you be sure that your MDS and OSS are synchronised?
> >
> > On 7 February 2017 at 19:52, Mike Selway <msel...@cray.com> wrote:
> >> Hello Brett,
> >>
> >>Actually, looking for someone who uses a commercialized
> approach (that retains user metadata and Lustre extended metadata) and not
> specifically the manual approaches of Chapter 17.
> >>
> >> Thanks!
> >> Mike
> >>
> >> Mike Selway | Sr. Tiered Storage Architect | Cray Inc.
> >> Work +1-301-332-4116 | msel...@cray.com
> >> 146 Castlemaine Ct,   Castle Rock,  CO  80104 | www.cray.com
> >>
> >>
> >>> From: Brett Lee [mailto:brettlee.lus...@gmail.com]
> >>> Sent: Monday, February 06, 2017 11:45 AM
> >>> To: Mike Selway <msel...@cray.com>
> >>> Cc: lustre-discuss@lists.lustre.org
> >>> Subject: Re: [lustre-discuss] Backup software for Lustre
> >>>
> >>> Hey Mike,
> >>>
> >>> "Chapter 17" and
> >>> http://www.intel.com/content/www/us/en/lustre/backup-and-
> restore-training.html
> >>>
> >>> both contain methods to backup & restore the entire Lustre file system.
> >>>
> >>> Are you looking for a solution that backs up only the (user) data
> files and their associated metadata (e.g. xattrs)?
> >>>
> >>> Brett
> >>> --
> >>> Protect Yourself From Cybercrime
> >>> PDS Software Solutions LLC
> >>> https://www.TrustPDS.com
> >>>
> >>>> On Mon, Feb 6, 2017 at 11:12 AM, Mike Selway <msel...@cray.com>
> wrote:
> >>>>
> >>>> Hello,
> >>>>  Anyone aware of and/or using a Backup software package to
> protect their LFS environment (not referring to the tools/scripts suggested
> in Chapter 17).
> >>>>
> >>>> Regards,
> >>>> Mike
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Intel Corporation
>
>
>
>
>
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Backup software for Lustre

2017-02-06 Thread Brett Lee
Hey Mike,

"Chapter 17" and

http://www.intel.com/content/www/us/en/lustre/backup-and-restore-training.html

both contain methods to backup & restore the entire Lustre file system.

Are you looking for a solution that backs up only the (user) data files and
their associated metadata (e.g. xattrs)?

Brett
--
Protect Yourself From Cybercrime
PDS Software Solutions LLC
https://www.TrustPDS.com 

On Mon, Feb 6, 2017 at 11:12 AM, Mike Selway  wrote:

> Hello,
>
>Anyone aware of and/or using a Backup software package to
> protect their LFS environment (not referring to the tools/scripts suggested
> in Chapter 17).
>
>
>
> Regards,
>
> Mike
>
>
>
> *Mike Selway* *|** Sr. Tiered Storage Architect | Cray Inc.*
>
> Work +1-301-332-4116 <(301)%20332-4116> | msel...@cray.com
>
> 146 Castlemaine Ct,   Castle Rock,  CO  80104 | www.cray.com
>
>
>
> [image: cid:image001.png@01CF8974.1C1FA000] 
>
>
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] many 'ksym' packages required

2016-12-19 Thread Brett Lee
Were there any relevant message during your building of the rpms?

Per https://jira.hpdd.intel.com/browse/LU-5614

maybe a missing dependency?  like kernel-devel?


Brett
--
Protect your Information and Identity with PDS
PDS Software Solutions LLC
https://www.TrustPDS.com 

On Mon, Dec 19, 2016 at 9:34 PM, Andrus, Brian Contractor 
wrote:

> All,
> I am running into an issue lately on rpms I build and the ones I download
> from intel, where I try to install the server zfs module
> (kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64.rpm) and it gives me a TON of
> errors about Requires: ksym(xxx)
> Example:
> Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64
> (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64)
>Requires: ksym(dmu_objset_pool) = 0xa8cb0bd0
> Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64
> (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64)
>Requires: ksym(zap_cursor_serialize) = 0x3f455060
> Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64
> (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64)
>Requires: ksym(dmu_prefetch) = 0x7947c677
> Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64
> (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64)
>Requires: ksym(dsl_prop_register) = 0xa6f021e0
> Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64
> (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64)
>Requires: ksym(dmu_objset_space) = 0x0a5a5f8f
> Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64
> (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64)
>Requires: ksym(zfs_prop_to_name) = 0xa483a8c3
> Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64
> (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64)
>Requires: ksym(txg_wait_callbacks) = 0x90f50ab1
> Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64
> (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64)
>Requires: ksym(nvlist_pack) = 0x424ac2e1
> Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64
> (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64)
>Requires: ksym(dmu_buf_rele) = 0x53e356d2
> Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64
> (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64)
>Requires: ksym(dmu_buf_hold_array_by_bonus) = 0x330ef227
> Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64
> (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64)
>Requires: ksym(dmu_objset_disown) = 0x27d01e19
> Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64
> (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64)
>
> Has anyone seen this before and know what the issue could be?
>
> Brian Andrus
> ITACS/Research Computing
> Naval Postgraduate School
> Monterey, California
> voice: 831-656-6238
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] building lustre from source rpms, mellanox OFED, CentOS 6.8

2016-12-16 Thread Brett Lee
Hi Lana, Here's a link:
https://wiki.hpdd.intel.com/display/PUB/Building+Lustre+from+Source

The first time through may require some clarification (at least it did for
me).  I'm sure this list can help if clarification is needed.

Brett
--
Secure your confidential information with PDS 2
PDS Software Solutions LLC
https://www.TrustPDS.com 

On Fri, Dec 16, 2016 at 10:35 AM, Lana Deere  wrote:

> Hi,
>
> The Lustre manual says that instructions for building Lustre from source
> are available online, but I have not been able to locate anything
> Lustre-specific, just general rpmbuild documentation and items in Jira
> about trouble people had.  Can someone give me a pointer to any
> Lustre-specific documentation there might be?
>
> The reason I want to build from source is because I need to upgrade the
> OFED in CentOS from the distribution version to a current Mellanox version
> (3.4-2, ideally).  Has anyone had any success in building Lustre against
> the Mellanox OFED stack, and if so is there any help or tips online
> somewhere for me to look at?
>
> Thanks!
>
> .. Lana (lana.de...@gmail.com)
>
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Simple Lustre question

2016-12-12 Thread Brett Lee
Good to hear, Ben.  This list is pretty helpful, so if you get stuck, check
back in.

Brett
--
Secure your confidential information with PDS2
PDS Software Solutions LLC
https://www.TrustPDS.com
On Dec 12, 2016 7:49 PM, "Markham Benjamin" <ben.mark...@xiilab.com> wrote:

> Hi Brett,
>
> Thanks for this. This all seems reasonable and understandable. Thanks for
> your help
>
> -Ben
>
> On Dec 13, 2016, at 11:39 AM, Brett Lee <brettlee.lus...@gmail.com> wrote:
>
> Hi Markham,
>
> Maybe think of Lustre as a bunch of different components...  While you can
> combine the components (server services and client services) on one node,
> you can also put each service on a different node, and connect them via a
> network.
>
> To begin, I suggest using just one node.  Start all the server services,
> and then start the client service.  Then start and stop it all a few
> times.  Once you get the hang of it, then split the server services to
> different nodes, and put the client on it's own node also.
>
> Seem reasonable?
> --
> Secure your confidential information with PDS2
> PDS Software Solutions LLC
> https://www.TrustPDS.com <https://www.trustpds.com/>
>
> On Mon, Dec 12, 2016 at 7:33 PM, Markham Benjamin <ben.mark...@xiilab.com>
> wrote:
>
>> Ah sorry. What I mean is something I could experiment on without any
>> hardware limitations. I hope that makes sense.
>>
>> But it seems I could just run Lustre on one node.
>>
>> -Ben
>>
>> On Dec 13, 2016, at 11:26 AM, Brett Lee <brettlee.lus...@gmail.com>
>> wrote:
>>
>> Proper?  Please expand on that. :)
>>
>> To get started, you could run Lustre on just one node.
>>
>> Brett
>> --
>> Secure your confidential information with PDS2
>> PDS Software Solutions LLC
>> https://www.TrustPDS.com <https://www.trustpds.com/>
>> On Dec 12, 2016 7:08 PM, "Markham Benjamin" <ben.mark...@xiilab.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I just have been reading about Lustre and getting into it. I just have a
>>> simple question: about how many computer would I need in order to set up a
>>> proper simple LustreFS?
>>>
>>> Thanks,
>>> Ben.
>>> ___
>>> lustre-discuss mailing list
>>> lustre-discuss@lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>
>>
>>
>
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Simple Lustre question

2016-12-12 Thread Brett Lee
Hi Markham,

Maybe think of Lustre as a bunch of different components...  While you can
combine the components (server services and client services) on one node,
you can also put each service on a different node, and connect them via a
network.

To begin, I suggest using just one node.  Start all the server services,
and then start the client service.  Then start and stop it all a few
times.  Once you get the hang of it, then split the server services to
different nodes, and put the client on it's own node also.

Seem reasonable?
-- 
Secure your confidential information with PDS2
PDS Software Solutions LLC
https://www.TrustPDS.com <https://www.trustpds.com/>

On Mon, Dec 12, 2016 at 7:33 PM, Markham Benjamin <ben.mark...@xiilab.com>
wrote:

> Ah sorry. What I mean is something I could experiment on without any
> hardware limitations. I hope that makes sense.
>
> But it seems I could just run Lustre on one node.
>
> -Ben
>
> On Dec 13, 2016, at 11:26 AM, Brett Lee <brettlee.lus...@gmail.com> wrote:
>
> Proper?  Please expand on that. :)
>
> To get started, you could run Lustre on just one node.
>
> Brett
> --
> Secure your confidential information with PDS2
> PDS Software Solutions LLC
> https://www.TrustPDS.com <https://www.trustpds.com/>
> On Dec 12, 2016 7:08 PM, "Markham Benjamin" <ben.mark...@xiilab.com>
> wrote:
>
>> Hello,
>>
>> I just have been reading about Lustre and getting into it. I just have a
>> simple question: about how many computer would I need in order to set up a
>> proper simple LustreFS?
>>
>> Thanks,
>> Ben.
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Simple Lustre question

2016-12-12 Thread Brett Lee
Proper?  Please expand on that. :)

To get started, you could run Lustre on just one node.

Brett
--
Secure your confidential information with PDS2
PDS Software Solutions LLC
https://www.TrustPDS.com
On Dec 12, 2016 7:08 PM, "Markham Benjamin"  wrote:

> Hello,
>
> I just have been reading about Lustre and getting into it. I just have a
> simple question: about how many computer would I need in order to set up a
> proper simple LustreFS?
>
> Thanks,
> Ben.
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Added OSTs, now lnet errors

2016-12-11 Thread Brett Lee
Hi Steve,  You're welcome for the suggestion.  I offered it as you
mentioned adding a couple new oss servers and noticing the entries in the
logs.  Helpful to know would be where you are seeing the errors - new nodes
only, or ??  Generally, networks with existing problems seems to work ok at
low bandwidths, but problems start to appear as loads increase - hence the
suggestion to check the network for problems.  A quick check could be made
with LNet self test between two different sets of nodes - set 1 nodes
indicate the problem, and set 2 do not.  Best,
On Dec 11, 2016 6:05 PM, "Steve Barnet" <bar...@icecube.wisc.edu> wrote:

> Hi Brett,
>
>
> On 12/11/16 4:46 PM, Brett Lee wrote:
>
>> Steve, It might be the network that LNet is running on.  Have you run
>> some bandwidth tests without LNet to check for network problems?
>>
>
>
> It's running over a 10Gb/s Ethernet network that is carrying
> other OSS traffic successfully. No routers or other fancy LNET
> features in play. However, it is quite possible that there are
> issues with the networking on the host side. Definitely on my
> list of things to test out.
>
>   At this point, I'm just trying to narrow the search space.
> I didn't find anything particularly revealing when I searched
> around, so I'm hoping some expert eyes can shine a bit of
> light on the situation.
>
> Thanks for the tip!
>
> Best,
>
> ---Steve
>
>
>> On Dec 11, 2016 3:37 PM, "Steve Barnet" <bar...@icecube.wisc.edu
>> <mailto:bar...@icecube.wisc.edu>> wrote:
>>
>> Hi all,
>>
>>   Seeing something very strange. I recently added two OSSes
>> and 10 OSTs to one of our filesystems. Things look OK under
>> light loads, but when we load them up, we start seeing lots
>> of LNet errors.
>>
>> OS: Scientific Linux 6.7
>> Lustre - Server: 2.8.0 Community version
>> Lustre - Client: 2.5.3
>>
>> The errors are below. Do these narrow the range of possible
>> problems?
>>
>>
>> Dec 11 11:17:39 lfs-ex-oss-20 kernel: LNetError:
>> 7732:0:(socklnd_cb.c:2509:ksocknal_check_peer_timeouts()) Total 4
>> stale ZC_REQs for peer 10.128.10.29@tcp1 detected; the
>> oldest(880f6a90e000) timed out 7 secs ago, resid: 0, wmem: 0
>> Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError:
>> 7732:0:(events.c:447:server_bulk_callback()) event type 5, status
>> -5, desc 8805379f8000
>> Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError:
>> 7732:0:(events.c:447:server_bulk_callback()) event type 5, status
>> -5, desc 880f375dc000
>> Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError:
>> 8234:0:(ldlm_lib.c:3175:target_bulk_io()) @@@ network error on bulk
>> READ  req@880e506263c0 x1551187318090340/t0(0)
>> o3->092e941d-272a-09e3-502b-9338dbf387d3@10.128.10.29@tcp1:587/0
>> lens 488/432 e 3 to 0 dl 1481476687 ref 1 fl Interpret:/0/0 rc 0/0
>> Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError:
>> 8234:0:(ldlm_lib.c:3175:target_bulk_io()) Skipped 1 previous similar
>> message
>> Dec 11 11:17:39 lfs-ex-oss-20 kernel: Lustre: lfs2-OST0024: Bulk IO
>> read error with 092e941d-272a-09e3-502b-9338dbf387d3 (at
>> 10.128.10.29@tcp1), client will retry: rc -110
>> Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError:
>> 7732:0:(events.c:447:server_bulk_callback()) event type 5, status
>> -5, desc 8804db0ce000
>> Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError:
>> 7732:0:(events.c:447:server_bulk_callback()) event type 5, status
>> -5, desc 880aa4374000
>>
>>
>> Thanks much!
>>
>> Best,
>>
>> ---Steve
>>
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org <mailto:lustre-discuss@lists.l
>> ustre.org>
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>> <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
>>
>>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Added OSTs, now lnet errors

2016-12-11 Thread Brett Lee
Steve, It might be the network that LNet is running on.  Have you run some
bandwidth tests without LNet to check for network problems?
On Dec 11, 2016 3:37 PM, "Steve Barnet"  wrote:

> Hi all,
>
>   Seeing something very strange. I recently added two OSSes
> and 10 OSTs to one of our filesystems. Things look OK under
> light loads, but when we load them up, we start seeing lots
> of LNet errors.
>
> OS: Scientific Linux 6.7
> Lustre - Server: 2.8.0 Community version
> Lustre - Client: 2.5.3
>
> The errors are below. Do these narrow the range of possible
> problems?
>
>
> Dec 11 11:17:39 lfs-ex-oss-20 kernel: LNetError:
> 7732:0:(socklnd_cb.c:2509:ksocknal_check_peer_timeouts()) Total 4 stale
> ZC_REQs for peer 10.128.10.29@tcp1 detected; the oldest(880f6a90e000)
> timed out 7 secs ago, resid: 0, wmem: 0
> Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError:
> 7732:0:(events.c:447:server_bulk_callback()) event type 5, status -5,
> desc 8805379f8000
> Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError:
> 7732:0:(events.c:447:server_bulk_callback()) event type 5, status -5,
> desc 880f375dc000
> Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError:
> 8234:0:(ldlm_lib.c:3175:target_bulk_io()) @@@ network error on bulk READ
> req@880e506263c0 x1551187318090340/t0(0)
> o3->092e941d-272a-09e3-502b-9338dbf387d3@10.128.10.29@tcp1:587/0 lens
> 488/432 e 3 to 0 dl 1481476687 ref 1 fl Interpret:/0/0 rc 0/0
> Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError:
> 8234:0:(ldlm_lib.c:3175:target_bulk_io()) Skipped 1 previous similar
> message
> Dec 11 11:17:39 lfs-ex-oss-20 kernel: Lustre: lfs2-OST0024: Bulk IO read
> error with 092e941d-272a-09e3-502b-9338dbf387d3 (at 10.128.10.29@tcp1),
> client will retry: rc -110
> Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError:
> 7732:0:(events.c:447:server_bulk_callback()) event type 5, status -5,
> desc 8804db0ce000
> Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError:
> 7732:0:(events.c:447:server_bulk_callback()) event type 5, status -5,
> desc 880aa4374000
>
>
> Thanks much!
>
> Best,
>
> ---Steve
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org