Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-09-19 Thread Hu Bert
Hi Pranith, i recently upgraded to version 3.12.14, still no change in load/performance. Have you received any feedback? At the moment i have 3 options: - problem can be fixed within version 3.12 - upgrade to 4.1 and magically/hopefully "fix" the problem (might not help when problem is within

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-09-02 Thread Pranith Kumar Karampuri
On Fri, Aug 31, 2018 at 1:18 PM Hu Bert wrote: > Hi Pranith, > > i just wanted to ask if you were able to get any feedback from your > colleagues :-) > Sorry, I didn't get a chance to. I am working on a customer issue which is taking away cycles from any other work. Let me get back to you once

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-08-31 Thread Hu Bert
Hi Pranith, i just wanted to ask if you were able to get any feedback from your colleagues :-) btw.: we migrated some stuff (static resources, small files) to a nfs server that we actually wanted to replace by glusterfs. Load and cpu usage has gone down a bit, but still is asymmetric on the 3

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-08-28 Thread Hu Bert
Hm, i noticed that in the shared.log (volume log file) on gluster11 and gluster12 (but not on gluster13) i now see these warnings: [2018-08-28 07:18:57.224367] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-shared-dht: no subvolume for hash (value) = 3054593291 [2018-08-28

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-08-28 Thread Hu Bert
a little update after about 2 hours of uptime: still/again high cpu usage by one brick processes. server load >30. gluster11: high cpu; brick /gluster/bricksdd1/; no hdd exchange so far gluster12: normal cpu; brick /gluster/bricksdd1_new/; hdd change /dev/sdd gluster13: high cpu; brick

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-08-27 Thread Hu Bert
Good Morning, today i update + rebooted all gluster servers, kernel update to 4.9.0-8 and gluster to 3.12.13. Reboots went fine, but on one of the gluster servers (gluster13) one of the bricks did come up at the beginning but then lost connection. OK: Status of volume: shared Gluster process

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-08-27 Thread Hu Bert
yeah, on debian xyz.log.1 is always the former logfile which has been rotated by logrotate. Just checked the 3 servers: now it looks good, i will check it again tomorrow. very strange, maybe logrotate hasn't worked properly. the performance problems remain :-) 2018-08-27 15:41 GMT+02:00 Milind

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-08-27 Thread Milind Changire
On Thu, Aug 23, 2018 at 5:28 PM, Pranith Kumar Karampuri < pkara...@redhat.com> wrote: > On Wed, Aug 22, 2018 at 12:01 PM Hu Bert wrote: > >> Just an addition: in general there are no log messages in >> /var/log/glusterfs/ (if you don't all 'gluster volume ...'), but on >> the node with the

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-08-23 Thread Pranith Kumar Karampuri
On Wed, Aug 22, 2018 at 12:01 PM Hu Bert wrote: > Just an addition: in general there are no log messages in > /var/log/glusterfs/ (if you don't all 'gluster volume ...'), but on > the node with the lowest load i see in cli.log.1: > > [2018-08-22 06:20:43.291055] I

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-08-22 Thread Hu Bert
Just an addition: in general there are no log messages in /var/log/glusterfs/ (if you don't all 'gluster volume ...'), but on the node with the lowest load i see in cli.log.1: [2018-08-22 06:20:43.291055] I [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - disconnecting now [2018-08-22

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-08-21 Thread Pranith Kumar Karampuri
On Tue, Aug 21, 2018 at 11:40 AM Hu Bert wrote: > Good morning :-) > > gluster11: > ls -l /gluster/bricksdd1/shared/.glusterfs/indices/xattrop/ > total 0 > -- 1 root root 0 Aug 14 06:14 > xattrop-006b65d8-9e81-4886-b380-89168ea079bd > > gluster12: > ls -l

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-08-21 Thread Hu Bert
Good morning :-) gluster11: ls -l /gluster/bricksdd1/shared/.glusterfs/indices/xattrop/ total 0 -- 1 root root 0 Aug 14 06:14 xattrop-006b65d8-9e81-4886-b380-89168ea079bd gluster12: ls -l /gluster/bricksdd1_new/shared/.glusterfs/indices/xattrop/ total 0 -- 1 root root 0 Jul 17

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-08-20 Thread Pranith Kumar Karampuri
On Tue, Aug 21, 2018 at 10:13 AM Pranith Kumar Karampuri < pkara...@redhat.com> wrote: > > > On Mon, Aug 20, 2018 at 3:20 PM Hu Bert wrote: > >> Regarding hardware the machines are identical. Intel Xeon E5-1650 v3 >> Hexa-Core; 64 GB DDR4 ECC; Dell PERC H330 8 Port SAS/SATA 12 GBit/s >> RAID

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-08-20 Thread Pranith Kumar Karampuri
On Mon, Aug 20, 2018 at 3:20 PM Hu Bert wrote: > Regarding hardware the machines are identical. Intel Xeon E5-1650 v3 > Hexa-Core; 64 GB DDR4 ECC; Dell PERC H330 8 Port SAS/SATA 12 GBit/s > RAID Controller; operating system running on a raid1, then 4 disks > (JBOD) as bricks. > > Ok, i ran perf

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-08-20 Thread Hu Bert
Regarding hardware the machines are identical. Intel Xeon E5-1650 v3 Hexa-Core; 64 GB DDR4 ECC; Dell PERC H330 8 Port SAS/SATA 12 GBit/s RAID Controller; operating system running on a raid1, then 4 disks (JBOD) as bricks. Ok, i ran perf for a few seconds. perf record

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-08-20 Thread Hu Bert
gluster volume heal shared info | grep -i number Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries:

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-08-20 Thread Pranith Kumar Karampuri
There are a lot of Lookup operations in the system. But I am not able to find why. Could you check the output of # gluster volume heal info | grep -i number it should print all zeros. On Fri, Aug 17, 2018 at 1:49 PM Hu Bert wrote: > I don't know what you exactly mean with workload, but the

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-08-17 Thread Hu Bert
I don't know what you exactly mean with workload, but the main function of the volume is storing (incl. writing, reading) images (from hundreds of bytes up to 30 MBs, overall ~7TB). The work is done by apache tomcat servers writing to / reading from the volume. Besides images there are some text

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-08-17 Thread Pranith Kumar Karampuri
There seems to be too many lookup operations compared to any other operations. What is the workload on the volume? On Fri, Aug 17, 2018 at 12:47 PM Hu Bert wrote: > i hope i did get it right. > > gluster volume profile shared start > wait 10 minutes > gluster volume profile shared info >

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-08-17 Thread Hu Bert
i hope i did get it right. gluster volume profile shared start wait 10 minutes gluster volume profile shared info gluster volume profile shared stop If that's ok, i've attached the output of the info command. 2018-08-17 8:31 GMT+02:00 Pranith Kumar Karampuri : > Please do volume profile also

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-08-17 Thread Pranith Kumar Karampuri
Please do volume profile also for around 10 minutes when CPU% is high. On Fri, Aug 17, 2018 at 11:56 AM Pranith Kumar Karampuri < pkara...@redhat.com> wrote: > As per the output, all io-threads are using a lot of CPU. It is better to > check what the volume profile is to see what is leading to

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-08-17 Thread Pranith Kumar Karampuri
As per the output, all io-threads are using a lot of CPU. It is better to check what the volume profile is to see what is leading to so much work for io-threads. Please follow the documentation at https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Monitoring%20Workload/ section:

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-08-16 Thread Pranith Kumar Karampuri
Could you do the following on one of the nodes where you are observing high CPU usage and attach that file to this thread? We can find what threads/processes are leading to high usage. Do this for say 10 minutes when you see the ~100% CPU. top -bHd 5 > /tmp/top.${HOSTNAME}.txt On Wed, Aug 15,

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-08-16 Thread Hu Bert
Hi, well, as the situation doesn't get better, we're quite helpless and mostly in the dark, so we're thinking about hiring some professional support. Any hint? :-) 2018-08-15 11:07 GMT+02:00 Hu Bert : > Hello again :-) > > The self heal must have finished as there are no log entries in >

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-08-15 Thread Hu Bert
Hello again :-) The self heal must have finished as there are no log entries in glustershd.log files anymore. According to munin disk latency (average io wait) has gone down to 100 ms, and disk utilization has gone down to ~60% - both on all servers and hard disks. But now system load on 2

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-08-14 Thread Hu Bert
Hi there, well, it seems the heal has finally finished. Couldn't see/find any related log message; is there such a message in a specific log file? But i see the same behaviour when the last heal finished: all CPU cores are consumed by brick processes; not only by the formerly failed bricksdd1,

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-08-01 Thread Hu Bert
Hello :-) Just wanted to give a short report... >> It could be saturating in the day. But if enough self-heals are going on, >> even in the night it should have been close to 100%. > > Lowest utilization was 70% over night, but i'll check this > evening/weekend. Also that 'stat...' is running.

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-07-27 Thread Hu Bert
>> Btw.: i've seen in the munin stats that the disk utilization for >> bricksdd1 on the healthy gluster servers is between 70% (night) and >> almost 99% (daytime). So it looks like that the basic problem is the >> disk which seems not to be able to work faster? If so (heal) >> performance won't

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-07-27 Thread Pranith Kumar Karampuri
On Fri, Jul 27, 2018 at 1:32 PM, Hu Bert wrote: > 2018-07-27 9:22 GMT+02:00 Pranith Kumar Karampuri : > > > > > > On Fri, Jul 27, 2018 at 12:36 PM, Hu Bert > wrote: > >> > >> 2018-07-27 8:52 GMT+02:00 Pranith Kumar Karampuri >: > >> > > >> > > >> > On Fri, Jul 27, 2018 at 11:53 AM, Hu Bert >

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-07-27 Thread Hu Bert
2018-07-27 9:22 GMT+02:00 Pranith Kumar Karampuri : > > > On Fri, Jul 27, 2018 at 12:36 PM, Hu Bert wrote: >> >> 2018-07-27 8:52 GMT+02:00 Pranith Kumar Karampuri : >> > >> > >> > On Fri, Jul 27, 2018 at 11:53 AM, Hu Bert >> > wrote: >> >> >> >> > Do you already have all the 19 directories

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-07-27 Thread Pranith Kumar Karampuri
On Fri, Jul 27, 2018 at 12:36 PM, Hu Bert wrote: > 2018-07-27 8:52 GMT+02:00 Pranith Kumar Karampuri : > > > > > > On Fri, Jul 27, 2018 at 11:53 AM, Hu Bert > wrote: > >> > >> > Do you already have all the 19 directories already created? If not > >> > could you find out which of the paths

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-07-27 Thread Hu Bert
2018-07-27 8:52 GMT+02:00 Pranith Kumar Karampuri : > > > On Fri, Jul 27, 2018 at 11:53 AM, Hu Bert wrote: >> >> > Do you already have all the 19 directories already created? If not >> > could you find out which of the paths need it and do a stat directly >> > instead >> > of find? >> >>

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-07-27 Thread Pranith Kumar Karampuri
On Fri, Jul 27, 2018 at 11:53 AM, Hu Bert wrote: > > Do you already have all the 19 directories already created? If not > could you find out which of the paths need it and do a stat directly > instead of find? > > Quite probable not all of them have been created (but counting how > much

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-07-27 Thread Hu Bert
> Do you already have all the 19 directories already created? If not could > you find out which of the paths need it and do a stat directly instead of > find? Quite probable not all of them have been created (but counting how much would take very long...). Hm, maybe running stat in a double

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-07-26 Thread Pranith Kumar Karampuri
On Fri, Jul 27, 2018 at 11:11 AM, Hu Bert wrote: > Good Morning :-) > > on server gluster11 about 1.25 million and on gluster13 about 1.35 > million log entries in glustershd.log file. About 70 GB got healed, > overall ~700GB of 2.0TB. Doesn't seem to run faster. I'm calling > 'find...' whenever

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-07-26 Thread Hu Bert
Good Morning :-) on server gluster11 about 1.25 million and on gluster13 about 1.35 million log entries in glustershd.log file. About 70 GB got healed, overall ~700GB of 2.0TB. Doesn't seem to run faster. I'm calling 'find...' whenever i notice that it has finished. Hmm... is it possible and

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-07-26 Thread Pranith Kumar Karampuri
On Thu, Jul 26, 2018 at 2:41 PM, Hu Bert wrote: > > Sorry, bad copy/paste :-(. > > np :-) > > The question regarding version 4.1 was meant more generally: does > gluster v4.0 etc. have a better performance than version 3.12 etc.? > Just curious :-) Sooner or later we have to upgrade anyway.

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-07-26 Thread Hu Bert
> Sorry, bad copy/paste :-(. np :-) The question regarding version 4.1 was meant more generally: does gluster v4.0 etc. have a better performance than version 3.12 etc.? Just curious :-) Sooner or later we have to upgrade anyway. btw.: gluster12 was the node with the failed brick, and i started

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-07-26 Thread Pranith Kumar Karampuri
On Thu, Jul 26, 2018 at 12:59 PM, Hu Bert wrote: > Hi Pranith, > > thanks a lot for your efforts and for tracking "my" problem with an issue. > :-) > > I've set this params on the gluster volume and will start the > 'find...' command within a short time. I'll probably add another > answer to

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-07-26 Thread Hu Bert
Hi Pranith, thanks a lot for your efforts and for tracking "my" problem with an issue. :-) I've set this params on the gluster volume and will start the 'find...' command within a short time. I'll probably add another answer to the list to document the progress. btw. - you had some typos:

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-07-26 Thread Pranith Kumar Karampuri
Thanks a lot for detailed write-up, this helps find the bottlenecks easily. On a high level, to handle this directory hierarchy i.e. lots of directories with files, we need to improve healing algorithms. Based on the data you provided, we need to make the following enhancements: 1) At the moment

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-07-25 Thread Hu Bert
Hi Pranith, Sry, it took a while to count the directories. I'll try to answer your questions as good as possible. > What kind of data do you have? > How many directories in the filesystem? > On average how many files per directory? > What is the depth of your directory hierarchy on average? >

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-07-24 Thread Pranith Kumar Karampuri
On Mon, Jul 23, 2018 at 4:16 PM, Hu Bert wrote: > Well, over the weekend about 200GB were copied, so now there are > ~400GB copied to the brick. That's far beyond a speed of 10GB per > hour. If I copied the 1.6 TB directly, that would be done within max 2 > days. But with the self heal this will

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-07-23 Thread Hu Bert
Well, over the weekend about 200GB were copied, so now there are ~400GB copied to the brick. That's far beyond a speed of 10GB per hour. If I copied the 1.6 TB directly, that would be done within max 2 days. But with the self heal this will take at least 20 days minimum. Why is the performance

Re: [Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-07-20 Thread Hu Bert
hmm... no one any idea? Additional question: the hdd on server gluster12 was changed, so far ~220 GB were copied. On the other 2 servers i see a lot of entries in glustershd.log, about 312.000 respectively 336.000 entries there yesterday, most of them (current log output) looking like this: