Re: [Gluster-users] Disabling read-ahead and io-cache for native fuse mounts

2019-02-12 Thread Hu Bert
fyi: we have 3 servers, each with 2 SW RAID10 used as bricks in a
replicate 3 setup (so 2 volumes); the default values set by OS (debian
stretch) are:

/dev/md3
Array Size : 29298911232 (27941.62 GiB 30002.09 GB)
/sys/block/md3/queue/read_ahead_kb : 3027

/dev/md4
Array Size : 19532607488 (18627.75 GiB 20001.39 GB)
/sys/block/md4/queue/read_ahead_kb : 2048

maybe that helps somehow :)

Hubert

Am Mi., 13. Feb. 2019 um 06:46 Uhr schrieb Manoj Pillai :
>
>
>
> On Wed, Feb 13, 2019 at 10:51 AM Raghavendra Gowdappa  
> wrote:
>>
>>
>>
>> On Tue, Feb 12, 2019 at 5:38 PM Raghavendra Gowdappa  
>> wrote:
>>>
>>> All,
>>>
>>> We've found perf xlators io-cache and read-ahead not adding any performance 
>>> improvement. At best read-ahead is redundant due to kernel read-ahead
>>
>>
>> One thing we are still figuring out is whether kernel read-ahead is tunable. 
>> From what we've explored, it _looks_ like (may not be entirely correct), ra 
>> is capped at 128KB. If that's the case, I am interested in few things:
>> * Are there any realworld applications/usecases, which would benefit from 
>> larger read-ahead (Manoj says block devices can do ra of 4MB)?
>
>
> kernel read-ahead is adaptive but influenced by the read-ahead setting on the 
> block device (/sys/block//queue/read_ahead_kb), which can be tuned. For 
> RHEL specifically, the default is 128KB (last I checked) but the default RHEL 
> tuned-profile, throughput-performance, bumps that up to 4MB. It should be 
> fairly easy to rig up a test  where 4MB read-ahead on the block device gives 
> better performance than 128KB read-ahead.
>
> -- Manoj
>
>> * Is the limit on kernel ra tunable a hard one? IOW, what does it take to 
>> make it to do higher ra? If its difficult, can glusterfs read-ahead provide 
>> the expected performance improvement for these applications that would 
>> benefit from aggressive ra (as glusterfs can support larger ra sizes)?
>>
>> I am still inclined to prefer kernel ra as I think its more intelligent and 
>> can identify more sequential patterns than Glusterfs read-ahead [1][2].
>> [1] https://www.kernel.org/doc/ols/2007/ols2007v2-pages-273-284.pdf
>> [2] https://lwn.net/Articles/155510/
>>
>>> and at worst io-cache is degrading the performance for workloads that 
>>> doesn't involve re-read. Given that VFS already have both these 
>>> functionalities, I am proposing to have these two translators turned off by 
>>> default for native fuse mounts.
>>>
>>> For non-native fuse mounts like gfapi (NFS-ganesha/samba) we can have these 
>>> xlators on by having custom profiles. Comments?
>>>
>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1665029
>>>
>>> regards,
>>> Raghavendra
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Disabling read-ahead and io-cache for native fuse mounts

2019-02-12 Thread Raghavendra Gowdappa
On Wed, Feb 13, 2019 at 11:16 AM Manoj Pillai  wrote:

>
>
> On Wed, Feb 13, 2019 at 10:51 AM Raghavendra Gowdappa 
> wrote:
>
>>
>>
>> On Tue, Feb 12, 2019 at 5:38 PM Raghavendra Gowdappa 
>> wrote:
>>
>>> All,
>>>
>>> We've found perf xlators io-cache and read-ahead not adding any
>>> performance improvement. At best read-ahead is redundant due to kernel
>>> read-ahead
>>>
>>
>> One thing we are still figuring out is whether kernel read-ahead is
>> tunable. From what we've explored, it _looks_ like (may not be entirely
>> correct), ra is capped at 128KB. If that's the case, I am interested in few
>> things:
>> * Are there any realworld applications/usecases, which would benefit from
>> larger read-ahead (Manoj says block devices can do ra of 4MB)?
>>
>
> kernel read-ahead is adaptive but influenced by the read-ahead setting on
> the block device (/sys/block//queue/read_ahead_kb), which can be
> tuned. For RHEL specifically, the default is 128KB (last I checked) but the
> default RHEL tuned-profile, throughput-performance, bumps that up to 4MB.
> It should be fairly easy to rig up a test  where 4MB read-ahead on the
> block device gives better performance than 128KB read-ahead.
>

Thanks Manoj. To add to what Manoj said and give more context here,
Glusterfs being a fuse-based fs is not exposed as a block device. So,
that's the first problem of where/how to tune and I've listed other
problems earlier.


> -- Manoj
>
> * Is the limit on kernel ra tunable a hard one? IOW, what does it take to
>> make it to do higher ra? If its difficult, can glusterfs read-ahead provide
>> the expected performance improvement for these applications that would
>> benefit from aggressive ra (as glusterfs can support larger ra sizes)?
>>
>> I am still inclined to prefer kernel ra as I think its more intelligent
>> and can identify more sequential patterns than Glusterfs read-ahead [1][2].
>> [1] https://www.kernel.org/doc/ols/2007/ols2007v2-pages-273-284.pdf
>> [2] https://lwn.net/Articles/155510/
>>
>> and at worst io-cache is degrading the performance for workloads that
>>> doesn't involve re-read. Given that VFS already have both these
>>> functionalities, I am proposing to have these two translators turned off by
>>> default for native fuse mounts.
>>>
>>> For non-native fuse mounts like gfapi (NFS-ganesha/samba) we can have
>>> these xlators on by having custom profiles. Comments?
>>>
>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1665029
>>>
>>> regards,
>>> Raghavendra
>>>
>>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Disabling read-ahead and io-cache for native fuse mounts

2019-02-12 Thread Manoj Pillai
On Wed, Feb 13, 2019 at 10:51 AM Raghavendra Gowdappa 
wrote:

>
>
> On Tue, Feb 12, 2019 at 5:38 PM Raghavendra Gowdappa 
> wrote:
>
>> All,
>>
>> We've found perf xlators io-cache and read-ahead not adding any
>> performance improvement. At best read-ahead is redundant due to kernel
>> read-ahead
>>
>
> One thing we are still figuring out is whether kernel read-ahead is
> tunable. From what we've explored, it _looks_ like (may not be entirely
> correct), ra is capped at 128KB. If that's the case, I am interested in few
> things:
> * Are there any realworld applications/usecases, which would benefit from
> larger read-ahead (Manoj says block devices can do ra of 4MB)?
>

kernel read-ahead is adaptive but influenced by the read-ahead setting on
the block device (/sys/block//queue/read_ahead_kb), which can be
tuned. For RHEL specifically, the default is 128KB (last I checked) but the
default RHEL tuned-profile, throughput-performance, bumps that up to 4MB.
It should be fairly easy to rig up a test  where 4MB read-ahead on the
block device gives better performance than 128KB read-ahead.

-- Manoj

* Is the limit on kernel ra tunable a hard one? IOW, what does it take to
> make it to do higher ra? If its difficult, can glusterfs read-ahead provide
> the expected performance improvement for these applications that would
> benefit from aggressive ra (as glusterfs can support larger ra sizes)?
>
> I am still inclined to prefer kernel ra as I think its more intelligent
> and can identify more sequential patterns than Glusterfs read-ahead [1][2].
> [1] https://www.kernel.org/doc/ols/2007/ols2007v2-pages-273-284.pdf
> [2] https://lwn.net/Articles/155510/
>
> and at worst io-cache is degrading the performance for workloads that
>> doesn't involve re-read. Given that VFS already have both these
>> functionalities, I am proposing to have these two translators turned off by
>> default for native fuse mounts.
>>
>> For non-native fuse mounts like gfapi (NFS-ganesha/samba) we can have
>> these xlators on by having custom profiles. Comments?
>>
>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1665029
>>
>> regards,
>> Raghavendra
>>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Disabling read-ahead and io-cache for native fuse mounts

2019-02-12 Thread Raghavendra Gowdappa
On Tue, Feb 12, 2019 at 5:38 PM Raghavendra Gowdappa 
wrote:

> All,
>
> We've found perf xlators io-cache and read-ahead not adding any
> performance improvement. At best read-ahead is redundant due to kernel
> read-ahead
>

One thing we are still figuring out is whether kernel read-ahead is
tunable. From what we've explored, it _looks_ like (may not be entirely
correct), ra is capped at 128KB. If that's the case, I am interested in few
things:
* Are there any realworld applications/usecases, which would benefit from
larger read-ahead (Manoj says block devices can do ra of 4MB)?
* Is the limit on kernel ra tunable a hard one? IOW, what does it take to
make it to do higher ra? If its difficult, can glusterfs read-ahead provide
the expected performance improvement for these applications that would
benefit from aggressive ra (as glusterfs can support larger ra sizes)?

I am still inclined to prefer kernel ra as I think its more intelligent and
can identify more sequential patterns than Glusterfs read-ahead [1][2].
[1] https://www.kernel.org/doc/ols/2007/ols2007v2-pages-273-284.pdf
[2] https://lwn.net/Articles/155510/

and at worst io-cache is degrading the performance for workloads that
> doesn't involve re-read. Given that VFS already have both these
> functionalities, I am proposing to have these two translators turned off by
> default for native fuse mounts.
>
> For non-native fuse mounts like gfapi (NFS-ganesha/samba) we can have
> these xlators on by having custom profiles. Comments?
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1665029
>
> regards,
> Raghavendra
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Disabling read-ahead and io-cache for native fuse mounts

2019-02-12 Thread Raghavendra Gowdappa
On Tue, Feb 12, 2019 at 11:09 PM Darrell Budic 
wrote:

> Is there an example of a custom profile you can share for my ovirt use
> case (with gfapi enabled)?
>

I was speaking about a group setting like "group metadata-cache". Its just
that custom options one would turn on for a class of applications or
problems.

Or are you just talking about the standard group settings for virt as a
> custom profile?
>
> On Feb 12, 2019, at 7:22 AM, Raghavendra Gowdappa 
> wrote:
>
> https://review.gluster.org/22203
>
> On Tue, Feb 12, 2019 at 5:38 PM Raghavendra Gowdappa 
> wrote:
>
>> All,
>>
>> We've found perf xlators io-cache and read-ahead not adding any
>> performance improvement. At best read-ahead is redundant due to kernel
>> read-ahead and at worst io-cache is degrading the performance for workloads
>> that doesn't involve re-read. Given that VFS already have both these
>> functionalities, I am proposing to have these two translators turned off by
>> default for native fuse mounts.
>>
>> For non-native fuse mounts like gfapi (NFS-ganesha/samba) we can have
>> these xlators on by having custom profiles. Comments?
>>
>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1665029
>>
>> regards,
>> Raghavendra
>>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Disabling read-ahead and io-cache for native fuse mounts

2019-02-12 Thread Darrell Budic
Is there an example of a custom profile you can share for my ovirt use case 
(with gfapi enabled)? Or are you just talking about the standard group settings 
for virt as a custom profile?

> On Feb 12, 2019, at 7:22 AM, Raghavendra Gowdappa  wrote:
> 
> https://review.gluster.org/22203 
> 
> On Tue, Feb 12, 2019 at 5:38 PM Raghavendra Gowdappa  > wrote:
> All,
> 
> We've found perf xlators io-cache and read-ahead not adding any performance 
> improvement. At best read-ahead is redundant due to kernel read-ahead and at 
> worst io-cache is degrading the performance for workloads that doesn't 
> involve re-read. Given that VFS already have both these functionalities, I am 
> proposing to have these two translators turned off by default for native fuse 
> mounts.
> 
> For non-native fuse mounts like gfapi (NFS-ganesha/samba) we can have these 
> xlators on by having custom profiles. Comments?
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1665029 
> 
> 
> regards,
> Raghavendra
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Disabling read-ahead and io-cache for native fuse mounts

2019-02-12 Thread Raghavendra Gowdappa
https://review.gluster.org/22203

On Tue, Feb 12, 2019 at 5:38 PM Raghavendra Gowdappa 
wrote:

> All,
>
> We've found perf xlators io-cache and read-ahead not adding any
> performance improvement. At best read-ahead is redundant due to kernel
> read-ahead and at worst io-cache is degrading the performance for workloads
> that doesn't involve re-read. Given that VFS already have both these
> functionalities, I am proposing to have these two translators turned off by
> default for native fuse mounts.
>
> For non-native fuse mounts like gfapi (NFS-ganesha/samba) we can have
> these xlators on by having custom profiles. Comments?
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1665029
>
> regards,
> Raghavendra
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Disabling read-ahead and io-cache for native fuse mounts

2019-02-12 Thread Raghavendra Gowdappa
All,

We've found perf xlators io-cache and read-ahead not adding any performance
improvement. At best read-ahead is redundant due to kernel read-ahead and
at worst io-cache is degrading the performance for workloads that doesn't
involve re-read. Given that VFS already have both these functionalities, I
am proposing to have these two translators turned off by default for native
fuse mounts.

For non-native fuse mounts like gfapi (NFS-ganesha/samba) we can have these
xlators on by having custom profiles. Comments?

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1665029

regards,
Raghavendra
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-02-12 Thread Nithya Balachandran
Not yet but we are discussing an interim release. It is going to take a
couple of days to review the fixes so not before then. We will update on
the list with dates once we decide.


On Tue, 12 Feb 2019 at 11:46, Artem Russakovskii 
wrote:

> Awesome. But is there a release schedule and an ETA for when these will be
> out in the repos?
>
> On Mon, Feb 11, 2019, 9:34 PM Raghavendra Gowdappa 
> wrote:
>
>>
>>
>> On Tue, Feb 12, 2019 at 10:24 AM Artem Russakovskii 
>> wrote:
>>
>>> Great job identifying the issue!
>>>
>>> Any ETA on the next release with the logging and crash fixes in it?
>>>
>>
>> I've marked write-behind corruption as a blocker for release-6. Logging
>> fixes are already in codebase.
>>
>>
>>> On Mon, Feb 11, 2019, 7:19 PM Raghavendra Gowdappa 
>>> wrote:
>>>


 On Mon, Feb 11, 2019 at 3:49 PM João Baúto <
 joao.ba...@neuro.fchampalimaud.org> wrote:

> Although I don't have these error messages, I'm having fuse crashes as
> frequent as you. I have disabled write-behind and the mount has been
> running over the weekend with heavy usage and no issues.
>

 The issue you are facing will likely be fixed by patch [1]. Me, Xavi
 and Nithya were able to identify the corruption in write-behind.

 [1] https://review.gluster.org/22189


> I can provide coredumps before disabling write-behind if needed. I
> opened a BZ report
>  with the
> crashes that I was having.
>
> *João Baúto*
> ---
>
> *Scientific Computing and Software Platform*
> Champalimaud Research
> Champalimaud Center for the Unknown
> Av. Brasília, Doca de Pedrouços
> 1400-038 Lisbon, Portugal
> fchampalimaud.org 
>
>
> Artem Russakovskii  escreveu no dia sábado,
> 9/02/2019 à(s) 22:18:
>
>> Alright. I've enabled core-dumping (hopefully), so now I'm waiting
>> for the next crash to see if it dumps a core for you guys to remotely 
>> debug.
>>
>> Then I can consider setting performance.write-behind to off and
>> monitoring for further crashes.
>>
>> Sincerely,
>> Artem
>>
>> --
>> Founder, Android Police , APK Mirror
>> , Illogical Robot LLC
>> beerpla.net | +ArtemRussakovskii
>>  | @ArtemR
>> 
>>
>>
>> On Fri, Feb 8, 2019 at 7:22 PM Raghavendra Gowdappa <
>> rgowd...@redhat.com> wrote:
>>
>>>
>>>
>>> On Sat, Feb 9, 2019 at 12:53 AM Artem Russakovskii <
>>> archon...@gmail.com> wrote:
>>>
 Hi Nithya,

 I can try to disable write-behind as long as it doesn't heavily
 impact performance for us. Which option is it exactly? I don't see it 
 set
 in my list of changed volume variables that I sent you guys earlier.

>>>
>>> The option is performance.write-behind
>>>
>>>
 Sincerely,
 Artem

 --
 Founder, Android Police , APK Mirror
 , Illogical Robot LLC
 beerpla.net | +ArtemRussakovskii
  | @ArtemR
 


 On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran <
 nbala...@redhat.com> wrote:

> Hi Artem,
>
> We have found the cause of one crash. Unfortunately we have not
> managed to reproduce the one you reported so we don't know if it is 
> the
> same cause.
>
> Can you disable write-behind on the volume and let us know if it
> solves the problem? If yes, it is likely to be the same issue.
>
>
> regards,
> Nithya
>
> On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii <
> archon...@gmail.com> wrote:
>
>> Sorry to disappoint, but the crash just happened again, so
>> lru-limit=0 didn't help.
>>
>> Here's the snippet of the crash and the subsequent remount by
>> monit.
>>
>>
>> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref]
>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>> [0x7f4402b99329]
>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>> [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>> [0x7f440b6b5218] ) 0-dict: dict is NULL [In
>> valid argument]
>> The message "I [MSGID: 108031]
>> [afr-common.c:2543:afr_local_discovery_cbk] 
>> 0-_data1-replicate-0:
>> selecting local read_child _data1-client-3" repeated 39 times 
>> between