Re: [lustre-discuss] Limit to the number of "--servicenode="

2018-09-29 Thread Andreas Dilger
I haven't checked the code recently, but I believe that there can be up to 32 
NIDs assigned per target.  

I've heard of at least some sites that are configuring four OSS nodes per OST, 
with three OSTs/OSS to allow failover of each OST to a different OSS. That will 
only increase load on an OSS to 4/3 instead of 2/1 during failover. 

Cheers, Andreas

> On Sep 29, 2018, at 22:27, David Cohen  wrote:
> 
> Hi,
> In all the manuals and examples there are only two "--servicenode=" in the 
> creation of the mgs nodes and oss. 
> Is that a limitation or can I create more service nodes?
> Is the maximum number of servicenodes is different for mgs and oss?
> 
> David
> 
> 
> 
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Limit to the number of "--servicenode="

2018-09-29 Thread David Cohen
Hi,
In all the manuals and examples there are only two "--servicenode=" in the
creation of the mgs nodes and oss.
Is that a limitation or can I create more service nodes?
Is the maximum number of servicenodes is different for mgs and oss?

David
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre write wrong data under postgresql benchmark tool test(concurrent access same data file with primary node write and standy node read)

2018-09-29 Thread Andreas Dilger
Is PG using O_DIRECT or buffered read/write?  Is it caching the pages in 
userspace?

Lustre will definitely keep pages consistent between clients, but if the 
application is caching the pages in userspace, and does not have any protocol 
between the nodes to invalidate cached pages when they are modified on disk, 
then the data will become inconsistent when one node modifies it.

That is the same reason it isn't possible to mount a single ext4 filesystem r/w 
on one node and r/o on another node with shared storage, because the filesystem 
doesn't expect data to be changing underneath it, and will cache pages in RAM 
and not re-read them if they are modified on the other node.

Cheers, Andreas

On Sep 29, 2018, at 07:57, 王亮 
mailto:wanzifore...@gmail.com>> wrote:

Hello, lustre development team

background: we have two postgresql instances running as a primary and standby 
and they share the same xlog file and data files (we change PG code to achieve 
this) which located in the mounted lustre file system, and we want to have a 
try with the lustre file system, we used gfs2 before, and we expect the lustre 
will show a much better performance, but ...

We meet a read/write concurrent access problem related with Lustre. Would you 
like to give us some suggestions? Any advices are appreciated, and thank you in 
advanced : )
note: we are very sure the standy instance will not write any data to disk. (to 
be sure of this, we also shutdown the standby end, and use pg_xlogdump tool to 
read the xlog file, the problem still happened, and pg_xlogdump to is just a 
query to with any write operation)


Scenario Description:
There’re 4 nodes(CentOS Linux release 7.4) connected with infiniband 
network(driven by MLNX_OFED_LINUX-4.4):
10.0.0.106 acts as MDS with a local PCI-E 800GB SSD that used as MDT.
10.0.0.101 acts as OSS with a same local PCI-E 800GB SSD that used as OST.
10.0.0.104 and 10.0.0.105 act as Lustre client and mount the Lustre file system 
at the directory of “/lustre”.
The Lustre related packages are compiled from official lustre-2.10.5-1.src.rpm.

The simplest verification(i.e. dd command) passed without errors.

Error:
Then start our customized PostgreSQL service at 104 and 105. 104 runs as the 
primary PostgreSQL server, and 105 runs as the secondary PostgreSQL server. All 
the two PostgreSQL nodes read/write the shared directory of “/lustre” provided 
by Lustre. The primary PostgreSQL server will open files with *RW* mode and 
write something into the files; at the *meantime* the second PostgreSQL server 
will open the same files with *R* mode and read the data written by the primary 
PostgreSQL server, and it gets the *wrong* data (the flushed data by primary in 
disk is error i.e. write wrong data into disk). This will happen when we run a 
benchmark tool of PostgreSQL.


PS 1.
We tried different options to mount the Lustre:
mount -t lustre -o flock 
10.0.0.106@o2ib0:/birdfs /lustre
mount -t lustre -o flock -o ro 
10.0.0.106@o2ib0:/birdfs /lustre
but the error always exists.

PS 2.
Attach the initial information, maybe helpful.
[root@106 ~]# mkfs.lustre --fsname=birdfs --mgs --mdt --index=0 --reformat 
/dev/nvme0n1

   Permanent disk data:
Target: birdfs:MDT
Index:  0
Lustre FS:  birdfs
Mount type: ldiskfs
Flags:  0x65
  (MDT MGS first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:

device size = 763097MB
formatting backing filesystem ldiskfs on /dev/nvme0n1
   target name   birdfs:MDT
   4k blocks 195353046
   options-J size=4096 -I 1024 -i 2560 -q -O 
dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E 
lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L birdfs:MDT  -J size=4096 -I 1024 -i 2560 -q 
-O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E 
lazy_journal_init -F /dev/nvme0n1 195353046

Writing CONFIGS/mountdata

[root@101 ~]# mkfs.lustre --fsname=birdfs --ost --reformat --index=0 
--mgsnode=10.0.0.106@o2ib0 /dev/nvme0n1

   Permanent disk data:
Target: birdfs:OST
Index:  0
Lustre FS:  birdfs
Mount type: ldiskfs
Flags:  0x62
  (OST first_time update )
Persistent mount opts: ,errors=remount-ro
Parameters: mgsnode=10.0.0.106@o2ib

device size = 763097MB
formatting backing filesystem ldiskfs on /dev/nvme0n1
   target name   birdfs:OST
   4k blocks 195353046
   options-J size=400 -I 512 -i 69905 -q -O 
extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E 
resize="4290772992",lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L birdfs:OST  -J size=400 -I 512 -i 69905 -q 
-O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E 
resize="4290772992",lazy_journal_init -F /dev/nvme0n1 195353046
Writing CONFIGS/mountdata


Looking forward to any replies.

Regards,
Bird



Re: [lustre-discuss] Updating kernel will require recompilation of lustre kernel modules?

2018-09-29 Thread Peter Jones
Are you sure about Luste 2.0?!

From: lustre-discuss  on behalf of 
Amjad Syed 
Date: Saturday, September 29, 2018 at 2:48 AM
To: "lustre-discuss@lists.lustre.org" 
Subject: [lustre-discuss] Updating kernel will require recompilation of lustre 
kernel modules?

Hello
We have an HPC running RHEL 7.4. We are using lustre 2.0
Red hat   last week released an advisory to update kernel  to fix mutagen 
astronomy bug.

Now question is we updrade kernel on MDS/OSS and linux client, do we need to 
recompile lustre against the updated kernel version ?
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] lustre write wrong data under postgresql benchmark tool test(concurrent access same data file with primary node write and standy node read)

2018-09-29 Thread 王亮
Hello, lustre development team


background: we have two postgresql instances running as a primary and
standby and they share the same xlog file and data files (we change PG code
to achieve this) which located in the mounted lustre file system, and we
want to have a try with the lustre file system, we used gfs2 before, and we
expect the lustre will show a much better performance, but ...



We meet a read/write concurrent access problem related with Lustre. Would
you like to give us some suggestions? Any advices are appreciated, and
thank you in advanced : )

note: we are very sure the standy instance will not write any data to disk.
(to be sure of this, we also shutdown the standby end, and use pg_xlogdump
tool to read the xlog file, the problem still happened, and pg_xlogdump to
is just a query to with any write operation)



Scenario Description:

There’re 4 nodes(CentOS Linux release 7.4) connected with infiniband
network(driven by MLNX_OFED_LINUX-4.4):

10.0.0.106 acts as MDS with a local PCI-E 800GB SSD that used as MDT.

10.0.0.101 acts as OSS with a same local PCI-E 800GB SSD that used as OST.

10.0.0.104 and 10.0.0.105 act as Lustre client and mount the Lustre file
system at the directory of “/lustre”.

The Lustre related packages are compiled from official
lustre-2.10.5-1.src.rpm.



The simplest verification(i.e. dd command) passed without errors.



Error:

Then start our customized PostgreSQL service at 104 and 105. 104 runs as
the primary PostgreSQL server, and 105 runs as the secondary PostgreSQL
server. All the two PostgreSQL nodes read/write the shared directory of “
/lustre” provided by Lustre. The primary PostgreSQL server will open files
with **RW** mode and write something into the files; at the **meantime**
the second PostgreSQL server will open the same files with **R** mode and
read the data written by the primary PostgreSQL server, and it gets the *
*wrong** data (the flushed data by primary in disk is error i.e. write
wrong data into disk). This will happen when we run a benchmark tool of
PostgreSQL.





PS 1.

We tried different options to mount the Lustre:

mount -t lustre -o flock 10.0.0.106@o2ib0:/birdfs /lustre

mount -t lustre -o flock -o ro 10.0.0.106@o2ib0:/birdfs /lustre

but the error always exists.



PS 2.

Attach the initial information, maybe helpful.

[root@106 ~]# mkfs.lustre --fsname=birdfs --mgs --mdt --index=0 --reformat
/dev/nvme0n1



   Permanent disk data:

Target: birdfs:MDT

Index:  0

Lustre FS:  birdfs

Mount type: ldiskfs

Flags:  0x65

  (MDT MGS first_time update )

Persistent mount opts: user_xattr,errors=remount-ro

Parameters:



device size = 763097MB

formatting backing filesystem ldiskfs on /dev/nvme0n1

   target name   birdfs:MDT

   4k blocks 195353046

   options-J size=4096 -I 1024 -i 2560 -q -O
dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E
lazy_journal_init -F

mkfs_cmd = mke2fs -j -b 4096 -L birdfs:MDT  -J size=4096 -I 1024 -i
2560 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E
lazy_journal_init -F /dev/nvme0n1 195353046



Writing CONFIGS/mountdata



[root@101 ~]# mkfs.lustre --fsname=birdfs --ost --reformat --index=0
--mgsnode=10.0.0.106@o2ib0 /dev/nvme0n1



   Permanent disk data:

Target: birdfs:OST

Index:  0

Lustre FS:  birdfs

Mount type: ldiskfs

Flags:  0x62

  (OST first_time update )

Persistent mount opts: ,errors=remount-ro

Parameters: mgsnode=10.0.0.106@o2ib



device size = 763097MB

formatting backing filesystem ldiskfs on /dev/nvme0n1

   target name   birdfs:OST

   4k blocks 195353046

   options-J size=400 -I 512 -i 69905 -q -O
extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E
resize="4290772992",lazy_journal_init -F

mkfs_cmd = mke2fs -j -b 4096 -L birdfs:OST  -J size=400 -I 512 -i 69905
-q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E
resize="4290772992",lazy_journal_init -F /dev/nvme0n1 195353046

Writing CONFIGS/mountdata





Looking forward to any replies.



Regards,

Bird


-- 
regards
denny
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Updating kernel will require recompilation of lustre kernel modules?

2018-09-29 Thread Amjad Syed
Hello
We have an HPC running RHEL 7.4. We are using lustre 2.0
Red hat   last week released an advisory to update kernel  to fix mutagen
astronomy bug.

Now question is we updrade kernel on MDS/OSS and linux client, do we need
to recompile lustre against the updated kernel version ?
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org