Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-07-25 Thread Hu Bert
Hi Pranith,

Sry, it took a while to count the directories. I'll try to answer your
questions as good as possible.

> What kind of data do you have?
> How many directories in the filesystem?
> On average how many files per directory?
> What is the depth of your directory hierarchy on average?
> What is average filesize?

We have mostly images (more than 95% of disk usage, 90% of file
count), some text files (like css, jsp, gpx etc.) and some binaries.

There are about 190.000 directories in the file system; maybe there
are some more because we're hit by bug 1512371 (parallel-readdir =
TRUE prevents directories listing). But the number of directories
could/will rise in the future (maybe millions).

files per directory: ranges from 0 to 100, on average it should be 20
files per directory (well, at least in the deepest dirs, see
explanation below).

Average filesize: ranges from a few hundred bytes up to 30 MB, on
average it should be 2-3 MB.

Directory hierarchy: maximum depth as seen from within the volume is
6, the average should be 3.

volume name: shared
mount point on clients: /data/repository/shared/
below /shared/ there are 2 directories:
- public/: mainly calculated images (file sizes from a few KB up to
max 1 MB) and some resouces (small PNGs with a size of a few hundred
bytes).
- private/: mainly source images; file sizes from 50 KB up to 30MB

We migrated from a NFS server (SPOF) to glusterfs and simply copied
our files. The images (which have an ID) are stored in the deepest
directories of the dir tree. I'll better explain it :-)

directory structure for the images (i'll omit some other miscellaneous
stuff, but it looks quite similar):
- ID of an image has 7 or 8 digits
- /shared/private/: /(first 3 digits of ID)/(next 3 digits of ID)/$ID.jpg
- /shared/public/: /(first 3 digits of ID)/(next 3 digits of
ID)/$ID/$misc_formats.jpg

That's why we have that many (sub-)directories. Files are only stored
in the lowest directory hierarchy. I hope i could make our structure
at least a bit more transparent.

i hope there's something we can do to raise performance a bit. thx in
advance :-)


2018-07-24 10:40 GMT+02:00 Pranith Kumar Karampuri :
>
>
> On Mon, Jul 23, 2018 at 4:16 PM, Hu Bert  wrote:
>>
>> Well, over the weekend about 200GB were copied, so now there are
>> ~400GB copied to the brick. That's far beyond a speed of 10GB per
>> hour. If I copied the 1.6 TB directly, that would be done within max 2
>> days. But with the self heal this will take at least 20 days minimum.
>>
>> Why is the performance that bad? No chance of speeding this up?
>
>
> What kind of data do you have?
> How many directories in the filesystem?
> On average how many files per directory?
> What is the depth of your directory hierarchy on average?
> What is average filesize?
>
> Based on this data we can see if anything can be improved. Or if there are
> some
> enhancements that need to be implemented in gluster to address this kind of
> data layout
>>
>>
>> 2018-07-20 9:41 GMT+02:00 Hu Bert :
>> > hmm... no one any idea?
>> >
>> > Additional question: the hdd on server gluster12 was changed, so far
>> > ~220 GB were copied. On the other 2 servers i see a lot of entries in
>> > glustershd.log, about 312.000 respectively 336.000 entries there
>> > yesterday, most of them (current log output) looking like this:
>> >
>> > [2018-07-20 07:30:49.757595] I [MSGID: 108026]
>> > [afr-self-heal-common.c:1724:afr_log_selfheal] 0-shared-replicate-3:
>> > Completed data selfheal on 0d863a62-0dd8-401c-b699-2b642d9fd2b6.
>> > sources=0 [2]  sinks=1
>> > [2018-07-20 07:30:49.992398] I [MSGID: 108026]
>> > [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
>> > 0-shared-replicate-3: performing metadata selfheal on
>> > 0d863a62-0dd8-401c-b699-2b642d9fd2b6
>> > [2018-07-20 07:30:50.243551] I [MSGID: 108026]
>> > [afr-self-heal-common.c:1724:afr_log_selfheal] 0-shared-replicate-3:
>> > Completed metadata selfheal on 0d863a62-0dd8-401c-b699-2b642d9fd2b6.
>> > sources=0 [2]  sinks=1
>> >
>> > or like this:
>> >
>> > [2018-07-20 07:38:41.726943] I [MSGID: 108026]
>> > [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
>> > 0-shared-replicate-3: performing metadata selfheal on
>> > 9276097a-cdac-4d12-9dc6-04b1ea4458ba
>> > [2018-07-20 07:38:41.855737] I [MSGID: 108026]
>> > [afr-self-heal-common.c:1724:afr_log_selfheal] 0-shared-replicate-3:
>> > Completed metadata selfheal on 9276097a-cdac-4d12-9dc6-04b1ea4458ba.
>> > sources=[0] 2  sinks=1
>> > [2018-07-20 07:38:44.755800] I [MSGID: 108026]
>> > [afr-self-heal-entry.c:887:afr_selfheal_entry_do]
>> > 0-shared-replicate-3: performing entry selfheal on
>> > 9276097a-cdac-4d12-9dc6-04b1ea4458ba
>> >
>> > is this behaviour normal? I'd expect these messages on the server with
>> > the failed brick, not on the other ones.
>> >
>> > 2018-07-19 8:31 GMT+02:00 Hu Bert :
>> >> Hi there,
>> >>
>> >> sent this mail yesterday, but somehow it didn't work? Wasn't archived,
>> >> so please be indulgent 

Re: [Gluster-users] trying to figure out the best solution for vm and email storage

2018-07-25 Thread Vlad Kopylov
Just create one 3 replica volume with 1 brick on each of 3 storage servers.
Raid5 for servers will be more then enough - it is already replica 3.
Use ovirt to mount glusterfs to VM from hosts (as it uses libgfapi) rather
then fuse mount from VM itself.
libgfapi is supposedly faster. Might depend on which mail storage type you
use - if maildir, libgfapi to VM on he host should be better.
Also mid that you might need compression for mail storage and deduplication
(of attachments)


On Wed, Jul 25, 2018 at 5:11 AM, Γιώργος Βασιλόπουλος  wrote:

> Hello
> I am trying to lay down my options regarding storage with glusterfs, vm
> storage and email storage
>
> Hardware in my disposal is specific : I have 9 servers for running vm's
> under ovirt 4.2 and 3 servers for storage
>
> The 3 storage machines are similar and each have 2xE5-2640v3 cpus 128GB
> RAM and 2x10G ethernet
> each storage server has inside 2x300gb 10k drives which I intent to use as
> os install and maybe a litle volume for isos on nfs
> also present are 6x200GB SSD drives which I think of using as tiering
> volume(s)
> And the main storage is on an external JBOD box with 12x4TB drives
> connected via SAS to the server with RAID controller capable of varius raid
> levels.
>
> So what I'm thinking is that I will create 3 replica 3 arbiter 1 volumes
> (high availability is the no 1 requirement) in a cyclic fashion, 2 data
> bricks and one arbiter on each storage server
> I will implement raid6 with one spare drive on each server (we are in an
> island and getting a disk replacement can take days occasionaly) which will
> give me about 36T of usable storage per server.
> So regarding the vm storage I am thinking that 2 volumes with 17TB each
> and an arbiter of 1TB.
> What messes things up is that I was required to put the email storage in
> this installation. Our email is pretty big and busy with about 5 users
> curently at 13T of storage.
> Currently it runs on a few vm's and uses storage from nfs given from
> another vm. It is runs postfix/dovecot and right now a single big vm does
> mailbox delivery but this reaches it's limits. Mail storage now is on a EMC
> VNX5500
> But it will be moved to glusterfs for various reasons.
>
> I would like some advise regarding the email storage. I think my options
> are
>
> 1a. use a VM as NFS give it a huge disk (raw image on gluster vm
> optimized) and be done with it
> 1b use a VM as NFS give it a 2 or 3 disks unified under lvm vg->lv (raw
> images on gluster vm optimized) and maybe take some advantage of using 2-3
> io-threads in o virt to write to 2-3 disks simultaneously. Will this give
> extra performance ?
>
> 2.Give gluster as nfs straight to dovecot but I wonder if this will have
> performance drawback since it will be fuse mounted. I am also worried about
> the arbiter volume since there will be thousands of small files,
> practically arbiter
> will probably have to be as large as the data bricks or half that size
>
> 3. Give gluster as glusterfs mount point which I think will have about the
> same issues as 2.
>
> I have read about problems with dovecot indexes and glusterfs. Is this
> still an issue? or is it a problem that only shows when there is no dovecot
> director.
> Personaly I am inclined on using solution 1 because I think that arbiter
> volumes will be smaller (Am I right?) though it may have some overhead
> regarding nfs on the vm. On the other hand this solution will use libgfapi
> which might balance things a bit.
> Will it help if in such a case use small (16mb) shard size and tiering ?
>
> I'm afraid I have it a bit mixed up in my mind and I could really use some
> help.
>
>
>
> --
> Βασιλόπουλος Γιώργος
> Ηλεκτρολόγος Μηχανικός Τ.Ε.
> Διαχειριστής Υπολ. Συστημάτων
>
> Πανεπιστήμιο Κρήτης
> Κ.Υ.Υ.Τ.Π.Ε.
> Τμήμα Επικοινωνιών και Δικτύων
> Βούτες Ηρακλείου 70013
> Τηλ   : 2810393310
> email : g.vasilopou...@uoc.gr
> http://www.ucnet.uoc.gr
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] trying to figure out the best solution for vm and email storage

2018-07-25 Thread Γιώργος Βασιλόπουλος

Hello
I am trying to lay down my options regarding storage with glusterfs, vm 
storage and email storage


Hardware in my disposal is specific : I have 9 servers for running vm's 
under ovirt 4.2 and 3 servers for storage


The 3 storage machines are similar and each have 2xE5-2640v3 cpus 128GB 
RAM and 2x10G ethernet
each storage server has inside 2x300gb 10k drives which I intent to use 
as os install and maybe a litle volume for isos on nfs
also present are 6x200GB SSD drives which I think of using as tiering 
volume(s)
And the main storage is on an external JBOD box with 12x4TB drives 
connected via SAS to the server with RAID controller capable of varius 
raid levels.


So what I'm thinking is that I will create 3 replica 3 arbiter 1 volumes 
(high availability is the no 1 requirement) in a cyclic fashion, 2 data 
bricks and one arbiter on each storage server
I will implement raid6 with one spare drive on each server (we are in an 
island and getting a disk replacement can take days occasionaly) which 
will give me about 36T of usable storage per server.
So regarding the vm storage I am thinking that 2 volumes with 17TB each 
and an arbiter of 1TB.
What messes things up is that I was required to put the email storage in 
this installation. Our email is pretty big and busy with about 5 
users curently at 13T of storage.
Currently it runs on a few vm's and uses storage from nfs given from 
another vm. It is runs postfix/dovecot and right now a single big vm 
does  mailbox delivery but this reaches it's limits. Mail storage now is 
on a EMC VNX5500

But it will be moved to glusterfs for various reasons.

I would like some advise regarding the email storage. I think my options 
are


1a. use a VM as NFS give it a huge disk (raw image on gluster vm 
optimized) and be done with it
1b use a VM as NFS give it a 2 or 3 disks unified under lvm vg->lv (raw 
images on gluster vm optimized) and maybe take some advantage of using 
2-3 io-threads in o virt to write to 2-3 disks simultaneously. Will this 
give extra performance ?


2.Give gluster as nfs straight to dovecot but I wonder if this will have 
performance drawback since it will be fuse mounted. I am also worried 
about the arbiter volume since there will be thousands of small files, 
practically arbiter

will probably have to be as large as the data bricks or half that size

3. Give gluster as glusterfs mount point which I think will have about 
the same issues as 2.


I have read about problems with dovecot indexes and glusterfs. Is this 
still an issue? or is it a problem that only shows when there is no 
dovecot director.
Personaly I am inclined on using solution 1 because I think that arbiter 
volumes will be smaller (Am I right?) though it may have some overhead 
regarding nfs on the vm. On the other hand this solution will use libgfapi

which might balance things a bit.
Will it help if in such a case use small (16mb) shard size and tiering ?

I'm afraid I have it a bit mixed up in my mind and I could really use 
some help.




--
Βασιλόπουλος Γιώργος
Ηλεκτρολόγος Μηχανικός Τ.Ε.
Διαχειριστής Υπολ. Συστημάτων

Πανεπιστήμιο Κρήτης
Κ.Υ.Υ.Τ.Π.Ε.
Τμήμα Επικοινωνιών και Δικτύων
Βούτες Ηρακλείου 70013
Τηλ   : 2810393310
email : g.vasilopou...@uoc.gr
http://www.ucnet.uoc.gr

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Announcing Gluster for Container Storage (GCS)

2018-07-25 Thread Vijay Bellur
Hi all,

We would like to let you  know that some of us have started focusing on an
initiative called ‘Gluster for Container Storage’ (in short GCS). As of
now, one can already use Gluster as storage for containers by making use of
different projects available in github repositories associated with gluster
 & Heketi .
The goal of the GCS initiative is to provide an easier integration of these
projects so that they can be consumed together as designed. We are
primarily focused on integration with Kubernetes (k8s) through this
initiative.

Key projects for GCS include:
Glusterd2 (GD2)

Repo: https://github.com/gluster/glusterd2

The challenge we have with current management layer of Gluster (glusterd)
is that it is not designed for a service oriented architecture. Heketi
overcame this limitation and made Gluster consumable in k8s by providing
all the necessary hooks needed for supporting Persistent Volume Claims.

Glusterd2 provides a service oriented architecture for volume & cluster
management. Gd2 also intends to provide many of the Heketi functionalities
needed by Kubernetes natively. Hence we are working on merging Heketi with
gd2 and you can follow more of this action in the issues associated with
the gd2 github repository.
gluster-block

Repo: https://github.com/gluster/gluster-block

This project intends to expose files in a gluster volume as block devices.
Gluster-block enables supporting ReadWriteOnce (RWO) PVCs and the
corresponding workloads in Kubernetes using gluster as the underlying
storage technology.

Gluster-block is intended to be consumed by stateful RWO applications like
databases and k8s infrastructure services like logging, metrics etc.
gluster-block is more preferred than file based Persistent Volumes in K8s
for stateful/transactional workloads as it provides better performance &
consistency guarantees.
anthill / operator

Repo: https://github.com/gluster/anthill

This project aims to add an operator for Gluster in Kubernetes., Since it
is relatively new, there are areas where you can contribute to make the
operator experience better (please refer to the list of issues). This
project intends to make the whole Gluster experience in k8s much smoother
by automatic management of operator tasks like installation, rolling
upgrades etc.
gluster-csi-driver

Repo: http://github.com/gluster/gluster-csi-driver

This project will provide CSI (Container Storage Interface) compliant
drivers for GlusterFS & gluster-block in k8s.
gluster-kubernetes

Repo: https://github.com/gluster/gluster-kubernetes

This project is intended to provide all the required installation and
management steps for getting gluster up and running in k8s.
GlusterFS

Repo: https://github.com/gluster/glusterfs

GlusterFS is the main and the core repository of Gluster. To support
storage in container world, we don’t need all the features of Gluster.
Hence, we would be focusing on a stack which would be absolutely required
in k8s. This would allow us to plan and execute tests well, and also
provide users with a setup which works without too many options to tweak.

Notice that glusterfs default volumes would continue to work as of now, but
the translator stack which is used in GCS will be much leaner and geared to
work optimally in k8s.
Monitoring
Repo: https://github.com/gluster/gluster-prometheus

As k8s ecosystem provides its own native monitoring mechanisms, we intend
to have this project be the placeholder for required monitoring plugins.
The scope of this project is currently WIP and we welcome your inputs to
shape the project.

More details on this can be found at:
https://lists.gluster.org/pipermail/gluster-users/2018-July/034435.html

Gluster-Containers

*Repo: https://github.com/gluster/gluster-containers
This repository provides
container specs / Dockerfiles that can be used with a container runtime
like cri-o & docker.Note that this is not an exhaustive or final list of
projects involved with GCS. We will continue to update the project list
depending on the new requirements and priorities that we discover in this
journey.*

*We welcome you to join this journey by looking up the repositories and
contributing to them. As always, we are happy to hear your thoughts about
this initiative and please stay tuned as we provide periodic updates about
GCS here!Regards,*

*Vijay*

*(on behalf of Gluster maintainers @ Red Hat)*
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Subject: Help needed in improving monitoring in Gluster

2018-07-25 Thread Maarten van Baarsel

On 23/7/2018 18:25 , Sankarshan Mukhopadhyay wrote:


Regarding monitoring; I would love to see in my monitoring that
geo-replication is working as intended; at the moment I'm faking georep
monitoring by having a process touch a file (every server involved in
gluster touches another file) on every volume and checking mtime on the
slave.



I'd like to request that if possible, you elaborate on how you'd like
to see the "as intended" situation. What kind of data points and/or
visualization would aid you in arriving at that conclusion?



for this particular example: perhaps last sync time, and number of files 
on both sides. i realize this is a difficult problem...


number of files touched by sync per run?

currently there is a 'started/stopped/faulty' indicator for the geo-rep, 
could be exposed as well.



a monitoring interface that is guaranteed to be non-blocking would be a 
great enhancement.


M.
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users