Re: [Gluster-devel] Glusterfs and Structured data

2018-10-07 Thread Raghavendra Gowdappa
+Gluster-users 

On Mon, Oct 8, 2018 at 9:34 AM Raghavendra Gowdappa 
wrote:

>
>
> On Fri, Feb 9, 2018 at 4:30 PM Raghavendra Gowdappa 
> wrote:
>
>>
>>
>> - Original Message -
>> > From: "Pranith Kumar Karampuri" 
>> > To: "Raghavendra G" 
>> > Cc: "Gluster Devel" 
>> > Sent: Friday, February 9, 2018 2:30:59 PM
>> > Subject: Re: [Gluster-devel] Glusterfs and Structured data
>> >
>> >
>> >
>> > On Thu, Feb 8, 2018 at 12:05 PM, Raghavendra G <
>> raghaven...@gluster.com >
>> > wrote:
>> >
>> >
>> >
>> >
>> >
>> > On Tue, Feb 6, 2018 at 8:15 PM, Vijay Bellur < vbel...@redhat.com >
>> wrote:
>> >
>> >
>> >
>> >
>> >
>> > On Sun, Feb 4, 2018 at 3:39 AM, Raghavendra Gowdappa <
>> rgowd...@redhat.com >
>> > wrote:
>> >
>> >
>> > All,
>> >
>> > One of our users pointed out to the documentation that glusterfs is not
>> good
>> > for storing "Structured data" [1], while discussing an issue [2].
>> >
>> >
>> > As far as I remember, the content around structured data in the Install
>> Guide
>> > is from a FAQ that was being circulated in Gluster, Inc. indicating the
>> > startup's market positioning. Most of that was based on not wanting to
>> get
>> > into performance based comparisons of storage systems that are
>> frequently
>> > seen in the structured data space.
>> >
>> >
>> > Does any of you have more context on the feasibility of storing
>> "structured
>> > data" on Glusterfs? Is one of the reasons for such a suggestion
>> "staleness
>> > of metadata" as encountered in bugs like [3]?
>> >
>> >
>> > There are challenges that distributed storage systems face when exposed
>> to
>> > applications that were written for a local filesystem interface. We have
>> > encountered problems with applications like tar [4] that are not in the
>> > realm of "Structured data". If we look at the common theme across all
>> these
>> > problems, it is related to metadata & read after write consistency
>> issues
>> > with the default translator stack that gets exposed on the client side.
>> > While the default stack is optimal for other scenarios, it does seem
>> that a
>> > category of applications needing strict metadata consistency is not well
>> > served by that. We have observed that disabling a few performance
>> > translators and tuning cache timeouts for VFS/FUSE have helped to
>> overcome
>> > some of them. The WIP effort on timestamp consistency across the
>> translator
>> > stack, patches that have been merged as a result of the bugs that you
>> > mention & other fixes for outstanding issues should certainly help in
>> > catering to these workloads better with the file interface.
>> >
>> > There are deployments that I have come across where glusterfs is used
>> for
>> > storing structured data. gluster-block & qemu-libgfapi overcome the
>> metadata
>> > consistency problem by exposing a file as a block device & by disabling
>> most
>> > of the performance translators in the default stack. Workloads that have
>> > been deemed problematic with the file interface for the reasons alluded
>> > above, function well with the block interface.
>> >
>> > I agree that gluster-block due to its usage of a subset of glusterfs
>> fops
>> > (mostly reads/writes I guess), runs into less number of consistency
>> issues.
>> > However, as you've mentioned we seem to disable perf xlator stack in our
>> > tests/use-cases till now. Note that perf xlator stack is one of worst
>> > offenders as far as the metadata consistency is concerned (relatively
>> less
>> > scenarios of data inconsistency). So, I wonder,
>> > * what would be the scenario if we enable perf xlator stack for
>> > gluster-block?
>> > * Is performance on gluster-block satisfactory so that we don't need
>> these
>> > xlators?
>> > - Or is it that these xlators are not useful for the workload usually
>> run on
>> > gluster-block (For random read/write workload, read/write caching
>> xlators
>> > offer less or no advantage)?
>> >
>> > Yes. They are not useful. Block/VM files are opened with

Re: [Gluster-devel] Glusterfs and Structured data

2018-10-07 Thread Raghavendra Gowdappa
On Fri, Feb 9, 2018 at 4:30 PM Raghavendra Gowdappa 
wrote:

>
>
> - Original Message -
> > From: "Pranith Kumar Karampuri" 
> > To: "Raghavendra G" 
> > Cc: "Gluster Devel" 
> > Sent: Friday, February 9, 2018 2:30:59 PM
> > Subject: Re: [Gluster-devel] Glusterfs and Structured data
> >
> >
> >
> > On Thu, Feb 8, 2018 at 12:05 PM, Raghavendra G < raghaven...@gluster.com
> >
> > wrote:
> >
> >
> >
> >
> >
> > On Tue, Feb 6, 2018 at 8:15 PM, Vijay Bellur < vbel...@redhat.com >
> wrote:
> >
> >
> >
> >
> >
> > On Sun, Feb 4, 2018 at 3:39 AM, Raghavendra Gowdappa <
> rgowd...@redhat.com >
> > wrote:
> >
> >
> > All,
> >
> > One of our users pointed out to the documentation that glusterfs is not
> good
> > for storing "Structured data" [1], while discussing an issue [2].
> >
> >
> > As far as I remember, the content around structured data in the Install
> Guide
> > is from a FAQ that was being circulated in Gluster, Inc. indicating the
> > startup's market positioning. Most of that was based on not wanting to
> get
> > into performance based comparisons of storage systems that are frequently
> > seen in the structured data space.
> >
> >
> > Does any of you have more context on the feasibility of storing
> "structured
> > data" on Glusterfs? Is one of the reasons for such a suggestion
> "staleness
> > of metadata" as encountered in bugs like [3]?
> >
> >
> > There are challenges that distributed storage systems face when exposed
> to
> > applications that were written for a local filesystem interface. We have
> > encountered problems with applications like tar [4] that are not in the
> > realm of "Structured data". If we look at the common theme across all
> these
> > problems, it is related to metadata & read after write consistency issues
> > with the default translator stack that gets exposed on the client side.
> > While the default stack is optimal for other scenarios, it does seem
> that a
> > category of applications needing strict metadata consistency is not well
> > served by that. We have observed that disabling a few performance
> > translators and tuning cache timeouts for VFS/FUSE have helped to
> overcome
> > some of them. The WIP effort on timestamp consistency across the
> translator
> > stack, patches that have been merged as a result of the bugs that you
> > mention & other fixes for outstanding issues should certainly help in
> > catering to these workloads better with the file interface.
> >
> > There are deployments that I have come across where glusterfs is used for
> > storing structured data. gluster-block & qemu-libgfapi overcome the
> metadata
> > consistency problem by exposing a file as a block device & by disabling
> most
> > of the performance translators in the default stack. Workloads that have
> > been deemed problematic with the file interface for the reasons alluded
> > above, function well with the block interface.
> >
> > I agree that gluster-block due to its usage of a subset of glusterfs fops
> > (mostly reads/writes I guess), runs into less number of consistency
> issues.
> > However, as you've mentioned we seem to disable perf xlator stack in our
> > tests/use-cases till now. Note that perf xlator stack is one of worst
> > offenders as far as the metadata consistency is concerned (relatively
> less
> > scenarios of data inconsistency). So, I wonder,
> > * what would be the scenario if we enable perf xlator stack for
> > gluster-block?
> > * Is performance on gluster-block satisfactory so that we don't need
> these
> > xlators?
> > - Or is it that these xlators are not useful for the workload usually
> run on
> > gluster-block (For random read/write workload, read/write caching xlators
> > offer less or no advantage)?
> >
> > Yes. They are not useful. Block/VM files are opened with O_DIRECT, so we
> > don't enable caching at any layer in glusterfs. md-cache could be useful
> for
> > serving fstat from glusterfs. But apart from that I don't see any other
> > xlator contributing much.
> >
> >
> >
> > - Or theoretically the workload is ought to benefit from perf xlators,
> but we
> > don't see them in our results (there are open bugs to this effect)?
> >
> > I am asking these questions to ascertain priority on fixing perf xlators
> for
> 

Re: [Gluster-devel] Glusterfs and Structured data

2018-02-16 Thread Raghavendra Gowdappa
On Sat, Feb 17, 2018 at 2:58 AM, Amar Tumballi <atumb...@redhat.com> wrote:

>
>
> On Thu, Feb 15, 2018 at 9:49 AM, Raghavendra Gowdappa <rgowd...@redhat.com
> > wrote:
>
>>
>>
>> On Wed, Feb 14, 2018 at 10:31 PM, Amar Tumballi <atumb...@redhat.com>
>> wrote:
>>
>>> Top posting as it is not exactly about in-consistency of perf layer.
>>>
>>> On the performance translators of Gluster, I am more interested to get
>>> work done on write-back caching layer, specially with using lease feature.
>>> Mainly because there are way too many usecases where a given directory
>>> would be used only by one client/application at the time.
>>>
>>
>> Can you explain a bit more in detail? Is it the problem of cached writes
>> reaching bricks after a lease is revoked (but application writes were done
>> when there was a valid lease)?
>>
>>
> I am not concerned on how quickly or slowly write reaches server if
> O_SYNC/O_DIRECT is not used. What I am expecting to see is, when there is
> just one client for the volume, for the below test case, no reads should
> ever reach bricks.
>
>   bash$ cd /mnt/glusterfs; cp /tmp/linux-tarball.tar.gz .; tar -xf
> linux-tarball.tar.gz; cd
>
> If above is properly achieved, totally happy. It may be already possible
> with current xlators, but if we can identify the options to achieve this,
> very happy. If not, lets fix that part of performance xlators.
>

This is not possible with just current perf xlator stack. The stack is
designed such that bricks are always considered as source of truth (for
both data and metadata). So, whatever that is cached is read a tleast once
from brick (either through pre-fetching like read-ahead etc or caching for
future access of same data). What you said is still possible by leveraging
VFS page-cache.

However, write-behind can be extended to do that, though it might be a bit
complex as currently it deals with lists of requests. But, to efficiently
implement this feature, write-behind should build the logic of "file"
(pages etc). Note that for a volume that is accessed just by a single
client, leases are of no value. But, I assume your intention was to achieve
the same goal even when multiple clients are accessing a volume, but only a
single client is active.

And you are aware that there is an issue for unified caching xlator :) [1].
One of the goals of this xlator is to achieve what you explained above.

[1] https://github.com/gluster/glusterfs/issues/218


> Regards,
> Amar
>
>
>>
>>> Regards,
>>> Amar
>>>
>>>
>>> On Wed, Feb 14, 2018 at 8:17 AM, Raghavendra G <raghaven...@gluster.com>
>>> wrote:
>>>
>>>> I've started marking "whiteboard" of bugs in this class with tag
>>>> "GLUSTERFS_METADATA_INCONSISTENCY". Please add the tag to any bugs
>>>> which you deem to fit in.
>>>>
>>>> On Fri, Feb 9, 2018 at 4:30 PM, Raghavendra Gowdappa <
>>>> rgowd...@redhat.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> - Original Message -
>>>>> > From: "Pranith Kumar Karampuri" <pkara...@redhat.com>
>>>>> > To: "Raghavendra G" <raghaven...@gluster.com>
>>>>> > Cc: "Gluster Devel" <gluster-devel@gluster.org>
>>>>> > Sent: Friday, February 9, 2018 2:30:59 PM
>>>>> > Subject: Re: [Gluster-devel] Glusterfs and Structured data
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Thu, Feb 8, 2018 at 12:05 PM, Raghavendra G <
>>>>> raghaven...@gluster.com >
>>>>> > wrote:
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Tue, Feb 6, 2018 at 8:15 PM, Vijay Bellur < vbel...@redhat.com >
>>>>> wrote:
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Sun, Feb 4, 2018 at 3:39 AM, Raghavendra Gowdappa <
>>>>> rgowd...@redhat.com >
>>>>> > wrote:
>>>>> >
>>>>> >
>>>>> > All,
>>>>> >
>>>>> > One of our users pointed out to the documentation that glusterfs is
>>>>> not good
>>>>> > for storing "Structured data" [1], while discussing an issue [2].
>>>>> >
>>>>> >
>>>>&g

Re: [Gluster-devel] Glusterfs and Structured data

2018-02-16 Thread Amar Tumballi
On Thu, Feb 15, 2018 at 9:49 AM, Raghavendra Gowdappa <rgowd...@redhat.com>
wrote:

>
>
> On Wed, Feb 14, 2018 at 10:31 PM, Amar Tumballi <atumb...@redhat.com>
> wrote:
>
>> Top posting as it is not exactly about in-consistency of perf layer.
>>
>> On the performance translators of Gluster, I am more interested to get
>> work done on write-back caching layer, specially with using lease feature.
>> Mainly because there are way too many usecases where a given directory
>> would be used only by one client/application at the time.
>>
>
> Can you explain a bit more in detail? Is it the problem of cached writes
> reaching bricks after a lease is revoked (but application writes were done
> when there was a valid lease)?
>
>
I am not concerned on how quickly or slowly write reaches server if
O_SYNC/O_DIRECT is not used. What I am expecting to see is, when there is
just one client for the volume, for the below test case, no reads should
ever reach bricks.

  bash$ cd /mnt/glusterfs; cp /tmp/linux-tarball.tar.gz .; tar -xf
linux-tarball.tar.gz; cd

If above is properly achieved, totally happy. It may be already possible
with current xlators, but if we can identify the options to achieve this,
very happy. If not, lets fix that part of performance xlators.

Regards,
Amar


>
>> Regards,
>> Amar
>>
>>
>> On Wed, Feb 14, 2018 at 8:17 AM, Raghavendra G <raghaven...@gluster.com>
>> wrote:
>>
>>> I've started marking "whiteboard" of bugs in this class with tag
>>> "GLUSTERFS_METADATA_INCONSISTENCY". Please add the tag to any bugs
>>> which you deem to fit in.
>>>
>>> On Fri, Feb 9, 2018 at 4:30 PM, Raghavendra Gowdappa <
>>> rgowd...@redhat.com> wrote:
>>>
>>>>
>>>>
>>>> - Original Message -
>>>> > From: "Pranith Kumar Karampuri" <pkara...@redhat.com>
>>>> > To: "Raghavendra G" <raghaven...@gluster.com>
>>>> > Cc: "Gluster Devel" <gluster-devel@gluster.org>
>>>> > Sent: Friday, February 9, 2018 2:30:59 PM
>>>> > Subject: Re: [Gluster-devel] Glusterfs and Structured data
>>>> >
>>>> >
>>>> >
>>>> > On Thu, Feb 8, 2018 at 12:05 PM, Raghavendra G <
>>>> raghaven...@gluster.com >
>>>> > wrote:
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Tue, Feb 6, 2018 at 8:15 PM, Vijay Bellur < vbel...@redhat.com >
>>>> wrote:
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Sun, Feb 4, 2018 at 3:39 AM, Raghavendra Gowdappa <
>>>> rgowd...@redhat.com >
>>>> > wrote:
>>>> >
>>>> >
>>>> > All,
>>>> >
>>>> > One of our users pointed out to the documentation that glusterfs is
>>>> not good
>>>> > for storing "Structured data" [1], while discussing an issue [2].
>>>> >
>>>> >
>>>> > As far as I remember, the content around structured data in the
>>>> Install Guide
>>>> > is from a FAQ that was being circulated in Gluster, Inc. indicating
>>>> the
>>>> > startup's market positioning. Most of that was based on not wanting
>>>> to get
>>>> > into performance based comparisons of storage systems that are
>>>> frequently
>>>> > seen in the structured data space.
>>>> >
>>>> >
>>>> > Does any of you have more context on the feasibility of storing
>>>> "structured
>>>> > data" on Glusterfs? Is one of the reasons for such a suggestion
>>>> "staleness
>>>> > of metadata" as encountered in bugs like [3]?
>>>> >
>>>> >
>>>> > There are challenges that distributed storage systems face when
>>>> exposed to
>>>> > applications that were written for a local filesystem interface. We
>>>> have
>>>> > encountered problems with applications like tar [4] that are not in
>>>> the
>>>> > realm of "Structured data". If we look at the common theme across all
>>>> these
>>>> > problems, it is related to metadata & read after write consistency
>>>> issues
>>>> > with the default translator stack that

Re: [Gluster-devel] Glusterfs and Structured data

2018-02-14 Thread Raghavendra Gowdappa
On Wed, Feb 14, 2018 at 10:31 PM, Amar Tumballi <atumb...@redhat.com> wrote:

> Top posting as it is not exactly about in-consistency of perf layer.
>
> On the performance translators of Gluster, I am more interested to get
> work done on write-back caching layer, specially with using lease feature.
> Mainly because there are way too many usecases where a given directory
> would be used only by one client/application at the time.
>

Can you explain a bit more in detail? Is it the problem of cached writes
reaching bricks after a lease is revoked (but application writes were done
when there was a valid lease)?


> Regards,
> Amar
>
>
> On Wed, Feb 14, 2018 at 8:17 AM, Raghavendra G <raghaven...@gluster.com>
> wrote:
>
>> I've started marking "whiteboard" of bugs in this class with tag
>> "GLUSTERFS_METADATA_INCONSISTENCY". Please add the tag to any bugs which
>> you deem to fit in.
>>
>> On Fri, Feb 9, 2018 at 4:30 PM, Raghavendra Gowdappa <rgowd...@redhat.com
>> > wrote:
>>
>>>
>>>
>>> - Original Message -
>>> > From: "Pranith Kumar Karampuri" <pkara...@redhat.com>
>>> > To: "Raghavendra G" <raghaven...@gluster.com>
>>> > Cc: "Gluster Devel" <gluster-devel@gluster.org>
>>> > Sent: Friday, February 9, 2018 2:30:59 PM
>>> > Subject: Re: [Gluster-devel] Glusterfs and Structured data
>>> >
>>> >
>>> >
>>> > On Thu, Feb 8, 2018 at 12:05 PM, Raghavendra G <
>>> raghaven...@gluster.com >
>>> > wrote:
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On Tue, Feb 6, 2018 at 8:15 PM, Vijay Bellur < vbel...@redhat.com >
>>> wrote:
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On Sun, Feb 4, 2018 at 3:39 AM, Raghavendra Gowdappa <
>>> rgowd...@redhat.com >
>>> > wrote:
>>> >
>>> >
>>> > All,
>>> >
>>> > One of our users pointed out to the documentation that glusterfs is
>>> not good
>>> > for storing "Structured data" [1], while discussing an issue [2].
>>> >
>>> >
>>> > As far as I remember, the content around structured data in the
>>> Install Guide
>>> > is from a FAQ that was being circulated in Gluster, Inc. indicating the
>>> > startup's market positioning. Most of that was based on not wanting to
>>> get
>>> > into performance based comparisons of storage systems that are
>>> frequently
>>> > seen in the structured data space.
>>> >
>>> >
>>> > Does any of you have more context on the feasibility of storing
>>> "structured
>>> > data" on Glusterfs? Is one of the reasons for such a suggestion
>>> "staleness
>>> > of metadata" as encountered in bugs like [3]?
>>> >
>>> >
>>> > There are challenges that distributed storage systems face when
>>> exposed to
>>> > applications that were written for a local filesystem interface. We
>>> have
>>> > encountered problems with applications like tar [4] that are not in the
>>> > realm of "Structured data". If we look at the common theme across all
>>> these
>>> > problems, it is related to metadata & read after write consistency
>>> issues
>>> > with the default translator stack that gets exposed on the client side.
>>> > While the default stack is optimal for other scenarios, it does seem
>>> that a
>>> > category of applications needing strict metadata consistency is not
>>> well
>>> > served by that. We have observed that disabling a few performance
>>> > translators and tuning cache timeouts for VFS/FUSE have helped to
>>> overcome
>>> > some of them. The WIP effort on timestamp consistency across the
>>> translator
>>> > stack, patches that have been merged as a result of the bugs that you
>>> > mention & other fixes for outstanding issues should certainly help in
>>> > catering to these workloads better with the file interface.
>>> >
>>> > There are deployments that I have come across where glusterfs is used
>>> for
>>> > storing structured data. gluster-block & qemu-libgfapi overcome the
>>> metadata
>>> > consistency

Re: [Gluster-devel] Glusterfs and Structured data

2018-02-14 Thread Amar Tumballi
Top posting as it is not exactly about in-consistency of perf layer.

On the performance translators of Gluster, I am more interested to get work
done on write-back caching layer, specially with using lease feature.
Mainly because there are way too many usecases where a given directory
would be used only by one client/application at the time.

Regards,
Amar


On Wed, Feb 14, 2018 at 8:17 AM, Raghavendra G <raghaven...@gluster.com>
wrote:

> I've started marking "whiteboard" of bugs in this class with tag
> "GLUSTERFS_METADATA_INCONSISTENCY". Please add the tag to any bugs which
> you deem to fit in.
>
> On Fri, Feb 9, 2018 at 4:30 PM, Raghavendra Gowdappa <rgowd...@redhat.com>
> wrote:
>
>>
>>
>> - Original Message -
>> > From: "Pranith Kumar Karampuri" <pkara...@redhat.com>
>> > To: "Raghavendra G" <raghaven...@gluster.com>
>> > Cc: "Gluster Devel" <gluster-devel@gluster.org>
>> > Sent: Friday, February 9, 2018 2:30:59 PM
>> > Subject: Re: [Gluster-devel] Glusterfs and Structured data
>> >
>> >
>> >
>> > On Thu, Feb 8, 2018 at 12:05 PM, Raghavendra G <
>> raghaven...@gluster.com >
>> > wrote:
>> >
>> >
>> >
>> >
>> >
>> > On Tue, Feb 6, 2018 at 8:15 PM, Vijay Bellur < vbel...@redhat.com >
>> wrote:
>> >
>> >
>> >
>> >
>> >
>> > On Sun, Feb 4, 2018 at 3:39 AM, Raghavendra Gowdappa <
>> rgowd...@redhat.com >
>> > wrote:
>> >
>> >
>> > All,
>> >
>> > One of our users pointed out to the documentation that glusterfs is not
>> good
>> > for storing "Structured data" [1], while discussing an issue [2].
>> >
>> >
>> > As far as I remember, the content around structured data in the Install
>> Guide
>> > is from a FAQ that was being circulated in Gluster, Inc. indicating the
>> > startup's market positioning. Most of that was based on not wanting to
>> get
>> > into performance based comparisons of storage systems that are
>> frequently
>> > seen in the structured data space.
>> >
>> >
>> > Does any of you have more context on the feasibility of storing
>> "structured
>> > data" on Glusterfs? Is one of the reasons for such a suggestion
>> "staleness
>> > of metadata" as encountered in bugs like [3]?
>> >
>> >
>> > There are challenges that distributed storage systems face when exposed
>> to
>> > applications that were written for a local filesystem interface. We have
>> > encountered problems with applications like tar [4] that are not in the
>> > realm of "Structured data". If we look at the common theme across all
>> these
>> > problems, it is related to metadata & read after write consistency
>> issues
>> > with the default translator stack that gets exposed on the client side.
>> > While the default stack is optimal for other scenarios, it does seem
>> that a
>> > category of applications needing strict metadata consistency is not well
>> > served by that. We have observed that disabling a few performance
>> > translators and tuning cache timeouts for VFS/FUSE have helped to
>> overcome
>> > some of them. The WIP effort on timestamp consistency across the
>> translator
>> > stack, patches that have been merged as a result of the bugs that you
>> > mention & other fixes for outstanding issues should certainly help in
>> > catering to these workloads better with the file interface.
>> >
>> > There are deployments that I have come across where glusterfs is used
>> for
>> > storing structured data. gluster-block & qemu-libgfapi overcome the
>> metadata
>> > consistency problem by exposing a file as a block device & by disabling
>> most
>> > of the performance translators in the default stack. Workloads that have
>> > been deemed problematic with the file interface for the reasons alluded
>> > above, function well with the block interface.
>> >
>> > I agree that gluster-block due to its usage of a subset of glusterfs
>> fops
>> > (mostly reads/writes I guess), runs into less number of consistency
>> issues.
>> > However, as you've mentioned we seem to disable perf xlator stack in our
>> > tests/use-cases till now. Note that perf xlator stack is one of worst
>> > offende

Re: [Gluster-devel] Glusterfs and Structured data

2018-02-13 Thread Raghavendra G
I've started marking "whiteboard" of bugs in this class with tag
"GLUSTERFS_METADATA_INCONSISTENCY". Please add the tag to any bugs which
you deem to fit in.

On Fri, Feb 9, 2018 at 4:30 PM, Raghavendra Gowdappa <rgowd...@redhat.com>
wrote:

>
>
> - Original Message -
> > From: "Pranith Kumar Karampuri" <pkara...@redhat.com>
> > To: "Raghavendra G" <raghaven...@gluster.com>
> > Cc: "Gluster Devel" <gluster-devel@gluster.org>
> > Sent: Friday, February 9, 2018 2:30:59 PM
> > Subject: Re: [Gluster-devel] Glusterfs and Structured data
> >
> >
> >
> > On Thu, Feb 8, 2018 at 12:05 PM, Raghavendra G < raghaven...@gluster.com
> >
> > wrote:
> >
> >
> >
> >
> >
> > On Tue, Feb 6, 2018 at 8:15 PM, Vijay Bellur < vbel...@redhat.com >
> wrote:
> >
> >
> >
> >
> >
> > On Sun, Feb 4, 2018 at 3:39 AM, Raghavendra Gowdappa <
> rgowd...@redhat.com >
> > wrote:
> >
> >
> > All,
> >
> > One of our users pointed out to the documentation that glusterfs is not
> good
> > for storing "Structured data" [1], while discussing an issue [2].
> >
> >
> > As far as I remember, the content around structured data in the Install
> Guide
> > is from a FAQ that was being circulated in Gluster, Inc. indicating the
> > startup's market positioning. Most of that was based on not wanting to
> get
> > into performance based comparisons of storage systems that are frequently
> > seen in the structured data space.
> >
> >
> > Does any of you have more context on the feasibility of storing
> "structured
> > data" on Glusterfs? Is one of the reasons for such a suggestion
> "staleness
> > of metadata" as encountered in bugs like [3]?
> >
> >
> > There are challenges that distributed storage systems face when exposed
> to
> > applications that were written for a local filesystem interface. We have
> > encountered problems with applications like tar [4] that are not in the
> > realm of "Structured data". If we look at the common theme across all
> these
> > problems, it is related to metadata & read after write consistency issues
> > with the default translator stack that gets exposed on the client side.
> > While the default stack is optimal for other scenarios, it does seem
> that a
> > category of applications needing strict metadata consistency is not well
> > served by that. We have observed that disabling a few performance
> > translators and tuning cache timeouts for VFS/FUSE have helped to
> overcome
> > some of them. The WIP effort on timestamp consistency across the
> translator
> > stack, patches that have been merged as a result of the bugs that you
> > mention & other fixes for outstanding issues should certainly help in
> > catering to these workloads better with the file interface.
> >
> > There are deployments that I have come across where glusterfs is used for
> > storing structured data. gluster-block & qemu-libgfapi overcome the
> metadata
> > consistency problem by exposing a file as a block device & by disabling
> most
> > of the performance translators in the default stack. Workloads that have
> > been deemed problematic with the file interface for the reasons alluded
> > above, function well with the block interface.
> >
> > I agree that gluster-block due to its usage of a subset of glusterfs fops
> > (mostly reads/writes I guess), runs into less number of consistency
> issues.
> > However, as you've mentioned we seem to disable perf xlator stack in our
> > tests/use-cases till now. Note that perf xlator stack is one of worst
> > offenders as far as the metadata consistency is concerned (relatively
> less
> > scenarios of data inconsistency). So, I wonder,
> > * what would be the scenario if we enable perf xlator stack for
> > gluster-block?
> > * Is performance on gluster-block satisfactory so that we don't need
> these
> > xlators?
> > - Or is it that these xlators are not useful for the workload usually
> run on
> > gluster-block (For random read/write workload, read/write caching xlators
> > offer less or no advantage)?
> >
> > Yes. They are not useful. Block/VM files are opened with O_DIRECT, so we
> > don't enable caching at any layer in glusterfs. md-cache could be useful
> for
> > serving fstat from glusterfs. But apart from that I don't see any other
> > xlator contributing much.
> >
> >

Re: [Gluster-devel] Glusterfs and Structured data

2018-02-09 Thread Raghavendra Gowdappa


- Original Message -
> From: "Pranith Kumar Karampuri" <pkara...@redhat.com>
> To: "Raghavendra G" <raghaven...@gluster.com>
> Cc: "Gluster Devel" <gluster-devel@gluster.org>
> Sent: Friday, February 9, 2018 2:30:59 PM
> Subject: Re: [Gluster-devel] Glusterfs and Structured data
> 
> 
> 
> On Thu, Feb 8, 2018 at 12:05 PM, Raghavendra G < raghaven...@gluster.com >
> wrote:
> 
> 
> 
> 
> 
> On Tue, Feb 6, 2018 at 8:15 PM, Vijay Bellur < vbel...@redhat.com > wrote:
> 
> 
> 
> 
> 
> On Sun, Feb 4, 2018 at 3:39 AM, Raghavendra Gowdappa < rgowd...@redhat.com >
> wrote:
> 
> 
> All,
> 
> One of our users pointed out to the documentation that glusterfs is not good
> for storing "Structured data" [1], while discussing an issue [2].
> 
> 
> As far as I remember, the content around structured data in the Install Guide
> is from a FAQ that was being circulated in Gluster, Inc. indicating the
> startup's market positioning. Most of that was based on not wanting to get
> into performance based comparisons of storage systems that are frequently
> seen in the structured data space.
> 
> 
> Does any of you have more context on the feasibility of storing "structured
> data" on Glusterfs? Is one of the reasons for such a suggestion "staleness
> of metadata" as encountered in bugs like [3]?
> 
> 
> There are challenges that distributed storage systems face when exposed to
> applications that were written for a local filesystem interface. We have
> encountered problems with applications like tar [4] that are not in the
> realm of "Structured data". If we look at the common theme across all these
> problems, it is related to metadata & read after write consistency issues
> with the default translator stack that gets exposed on the client side.
> While the default stack is optimal for other scenarios, it does seem that a
> category of applications needing strict metadata consistency is not well
> served by that. We have observed that disabling a few performance
> translators and tuning cache timeouts for VFS/FUSE have helped to overcome
> some of them. The WIP effort on timestamp consistency across the translator
> stack, patches that have been merged as a result of the bugs that you
> mention & other fixes for outstanding issues should certainly help in
> catering to these workloads better with the file interface.
> 
> There are deployments that I have come across where glusterfs is used for
> storing structured data. gluster-block & qemu-libgfapi overcome the metadata
> consistency problem by exposing a file as a block device & by disabling most
> of the performance translators in the default stack. Workloads that have
> been deemed problematic with the file interface for the reasons alluded
> above, function well with the block interface.
> 
> I agree that gluster-block due to its usage of a subset of glusterfs fops
> (mostly reads/writes I guess), runs into less number of consistency issues.
> However, as you've mentioned we seem to disable perf xlator stack in our
> tests/use-cases till now. Note that perf xlator stack is one of worst
> offenders as far as the metadata consistency is concerned (relatively less
> scenarios of data inconsistency). So, I wonder,
> * what would be the scenario if we enable perf xlator stack for
> gluster-block?
> * Is performance on gluster-block satisfactory so that we don't need these
> xlators?
> - Or is it that these xlators are not useful for the workload usually run on
> gluster-block (For random read/write workload, read/write caching xlators
> offer less or no advantage)?
> 
> Yes. They are not useful. Block/VM files are opened with O_DIRECT, so we
> don't enable caching at any layer in glusterfs. md-cache could be useful for
> serving fstat from glusterfs. But apart from that I don't see any other
> xlator contributing much.
> 
> 
> 
> - Or theoretically the workload is ought to benefit from perf xlators, but we
> don't see them in our results (there are open bugs to this effect)?
> 
> I am asking these questions to ascertain priority on fixing perf xlators for
> (meta)data inconsistencies. If we offer a different solution for these
> workloads, the need for fixing these issues will be less.
> 
> My personal opinion is that both block and fs should work correctly. i.e.
> caching xlators shouldn't lead to inconsistency issues. 

+1. That's my personal opinion too. We'll try to fix these issues. However, we 
need to qualify the fixes. It would be helpful if community can help here. 
We'll let community know when the fixes are in.

> It would be better
> if we are i

Re: [Gluster-devel] Glusterfs and Structured data

2018-02-09 Thread Pranith Kumar Karampuri
On Thu, Feb 8, 2018 at 12:05 PM, Raghavendra G 
wrote:

>
>
> On Tue, Feb 6, 2018 at 8:15 PM, Vijay Bellur  wrote:
>
>>
>>
>> On Sun, Feb 4, 2018 at 3:39 AM, Raghavendra Gowdappa > > wrote:
>>
>>> All,
>>>
>>> One of our users pointed out to the documentation that glusterfs is not
>>> good for storing "Structured data" [1], while discussing an issue [2].
>>
>>
>>
>> As far as I remember, the content around structured data in the Install
>> Guide is from a FAQ that was being circulated in Gluster, Inc. indicating
>> the startup's market positioning. Most of that was based on not wanting to
>> get into performance based comparisons of storage systems that are
>> frequently seen in the structured data space.
>>
>>
>>> Does any of you have more context on the feasibility of storing
>>> "structured data" on Glusterfs? Is one of the reasons for such a suggestion
>>> "staleness of metadata" as encountered in bugs like [3]?
>>>
>>
>>
>> There are challenges that distributed storage systems face when exposed
>> to applications that were written for a local filesystem interface. We have
>> encountered problems with applications like tar [4] that are not in the
>> realm of "Structured data". If we look at the common theme across all these
>> problems, it is related to metadata & read after write consistency issues
>> with the default translator stack that gets exposed on the client side.
>> While the default stack is optimal for other scenarios, it does seem that a
>> category of applications needing strict metadata consistency is not well
>> served by that. We have observed that disabling a few performance
>> translators and tuning cache timeouts for VFS/FUSE have helped to overcome
>> some of them. The WIP effort on timestamp consistency across the translator
>> stack, patches that have been merged as a result of the bugs that you
>> mention & other fixes for outstanding issues should certainly help in
>> catering to these workloads better with the file interface.
>>
>> There are deployments that I have come across where glusterfs is used for
>> storing structured data. gluster-block  & qemu-libgfapi overcome the
>> metadata consistency problem by exposing a file as a block device & by
>> disabling most of the performance translators in the default stack.
>> Workloads that have been deemed problematic with the file interface for the
>> reasons alluded above, function well with the block interface.
>>
>
> I agree that gluster-block due to its usage of a subset of glusterfs fops
> (mostly reads/writes I guess), runs into less number of consistency issues.
> However, as you've mentioned we seem to disable perf xlator stack in our
> tests/use-cases till now. Note that perf xlator stack is one of worst
> offenders as far as the metadata consistency is concerned (relatively less
> scenarios of data inconsistency). So, I wonder,
> * what would be the scenario if we enable perf xlator stack for
> gluster-block?
> * Is performance on gluster-block satisfactory so that we don't need these
> xlators?
>   - Or is it that these xlators are not useful for the workload usually
> run on gluster-block (For random read/write workload, read/write caching
> xlators offer less or no advantage)?
>

Yes. They are not useful. Block/VM files are opened with O_DIRECT, so we
don't enable caching at any layer in glusterfs. md-cache could be useful
for serving fstat from glusterfs. But apart from that I don't see any other
xlator contributing much.


>   - Or theoretically the workload is ought to benefit from perf xlators,
> but we don't see them in our results (there are open bugs to this effect)?
>
> I am asking these questions to ascertain priority on fixing perf xlators
> for (meta)data inconsistencies. If we offer a different solution for these
> workloads, the need for fixing these issues will be less.
>

My personal opinion is that both block and fs should work correctly. i.e.
caching xlators shouldn't lead to inconsistency issues. It would be better
if we are in a position where we choose a workload on block vs fs based on
their performance for that workload and nothing else. Block/VM usecases
change the workload of the application for glusterfs, so for small file
operations the kind of performance you see on block can never be achieved
by glusterfs with the current architecture/design.


>
> I feel that we have come a long way from the time the install guide was
>> written and an update for removing the "staleness of content" might be in
>> order there :-).
>>
>> Regards,
>> Vijay
>>
>> [4] https://bugzilla.redhat.com/show_bug.cgi?id=1058526
>>
>>
>>>
>>> [1] http://docs.gluster.org/en/latest/Install-Guide/Overview/
>>> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1512691
>>> [3] https://bugzilla.redhat.com/show_bug.cgi?id=1390050
>>>
>>> regards,
>>> Raghavendra
>>> ___
>>> Gluster-devel mailing list
>>> 

Re: [Gluster-devel] Glusterfs and Structured data

2018-02-09 Thread Raghavendra Gowdappa
+gluster-users

Another guideline we can provide is to disable all performance xlators for 
workloads requiring strict metadata consistency (even for non gluster-block 
usecases like native fuse mount etc). Note that we might still can have few 
perf xlators turned on. But, that will require some experimentation. The safest 
and easiest would be to turn off following xlators:

* performance.read-ahead
* performance.write-behind
* performance.readdir-ahead and performance.parallel-readdir
* performance.quick-read
* performance.stat-prefetch
* performance.io-cache

performance.open-behind can be turned on if the application doesn't require the 
functionality of file accessible through an fd opened on a mountpoint while 
file is deleted from a different mount point. As far as metadata 
inconsistencies go, I am not aware of any issues with performance.open-behind.

Please note that as has been pointed out by different mails in this thread, 
perf-xlators is one part (albeit larger one) of the bigger problem of metadata 
inconsistency. 

regards,
Raghavendra

- Original Message -
> From: "Vijay Bellur" <vbel...@redhat.com>
> To: "Raghavendra G" <raghaven...@gluster.com>
> Cc: "Raghavendra Gowdappa" <rgowd...@redhat.com>, "Gluster Devel" 
> <gluster-devel@gluster.org>
> Sent: Friday, February 9, 2018 1:34:25 PM
> Subject: Re: [Gluster-devel] Glusterfs and Structured data
> 
> On Wed, Feb 7, 2018 at 10:35 PM, Raghavendra G <raghaven...@gluster.com>
> wrote:
> 
> >
> >
> > On Tue, Feb 6, 2018 at 8:15 PM, Vijay Bellur <vbel...@redhat.com> wrote:
> >
> >>
> >>
> >> On Sun, Feb 4, 2018 at 3:39 AM, Raghavendra Gowdappa <rgowd...@redhat.com
> >> > wrote:
> >>
> >>> All,
> >>>
> >>> One of our users pointed out to the documentation that glusterfs is not
> >>> good for storing "Structured data" [1], while discussing an issue [2].
> >>
> >>
> >>
> >> As far as I remember, the content around structured data in the Install
> >> Guide is from a FAQ that was being circulated in Gluster, Inc. indicating
> >> the startup's market positioning. Most of that was based on not wanting to
> >> get into performance based comparisons of storage systems that are
> >> frequently seen in the structured data space.
> >>
> >>
> >>> Does any of you have more context on the feasibility of storing
> >>> "structured data" on Glusterfs? Is one of the reasons for such a
> >>> suggestion
> >>> "staleness of metadata" as encountered in bugs like [3]?
> >>>
> >>
> >>
> >> There are challenges that distributed storage systems face when exposed
> >> to applications that were written for a local filesystem interface. We
> >> have
> >> encountered problems with applications like tar [4] that are not in the
> >> realm of "Structured data". If we look at the common theme across all
> >> these
> >> problems, it is related to metadata & read after write consistency issues
> >> with the default translator stack that gets exposed on the client side.
> >> While the default stack is optimal for other scenarios, it does seem that
> >> a
> >> category of applications needing strict metadata consistency is not well
> >> served by that. We have observed that disabling a few performance
> >> translators and tuning cache timeouts for VFS/FUSE have helped to overcome
> >> some of them. The WIP effort on timestamp consistency across the
> >> translator
> >> stack, patches that have been merged as a result of the bugs that you
> >> mention & other fixes for outstanding issues should certainly help in
> >> catering to these workloads better with the file interface.
> >>
> >> There are deployments that I have come across where glusterfs is used for
> >> storing structured data. gluster-block  & qemu-libgfapi overcome the
> >> metadata consistency problem by exposing a file as a block device & by
> >> disabling most of the performance translators in the default stack.
> >> Workloads that have been deemed problematic with the file interface for
> >> the
> >> reasons alluded above, function well with the block interface.
> >>
> >
> > I agree that gluster-block due to its usage of a subset of glusterfs fops
> > (mostly reads/writes I guess), runs into less number of consistency issues.
> > However, as you've mentioned we seem to disable perf xlator s

Re: [Gluster-devel] Glusterfs and Structured data

2018-02-09 Thread Vijay Bellur
On Wed, Feb 7, 2018 at 10:35 PM, Raghavendra G 
wrote:

>
>
> On Tue, Feb 6, 2018 at 8:15 PM, Vijay Bellur  wrote:
>
>>
>>
>> On Sun, Feb 4, 2018 at 3:39 AM, Raghavendra Gowdappa > > wrote:
>>
>>> All,
>>>
>>> One of our users pointed out to the documentation that glusterfs is not
>>> good for storing "Structured data" [1], while discussing an issue [2].
>>
>>
>>
>> As far as I remember, the content around structured data in the Install
>> Guide is from a FAQ that was being circulated in Gluster, Inc. indicating
>> the startup's market positioning. Most of that was based on not wanting to
>> get into performance based comparisons of storage systems that are
>> frequently seen in the structured data space.
>>
>>
>>> Does any of you have more context on the feasibility of storing
>>> "structured data" on Glusterfs? Is one of the reasons for such a suggestion
>>> "staleness of metadata" as encountered in bugs like [3]?
>>>
>>
>>
>> There are challenges that distributed storage systems face when exposed
>> to applications that were written for a local filesystem interface. We have
>> encountered problems with applications like tar [4] that are not in the
>> realm of "Structured data". If we look at the common theme across all these
>> problems, it is related to metadata & read after write consistency issues
>> with the default translator stack that gets exposed on the client side.
>> While the default stack is optimal for other scenarios, it does seem that a
>> category of applications needing strict metadata consistency is not well
>> served by that. We have observed that disabling a few performance
>> translators and tuning cache timeouts for VFS/FUSE have helped to overcome
>> some of them. The WIP effort on timestamp consistency across the translator
>> stack, patches that have been merged as a result of the bugs that you
>> mention & other fixes for outstanding issues should certainly help in
>> catering to these workloads better with the file interface.
>>
>> There are deployments that I have come across where glusterfs is used for
>> storing structured data. gluster-block  & qemu-libgfapi overcome the
>> metadata consistency problem by exposing a file as a block device & by
>> disabling most of the performance translators in the default stack.
>> Workloads that have been deemed problematic with the file interface for the
>> reasons alluded above, function well with the block interface.
>>
>
> I agree that gluster-block due to its usage of a subset of glusterfs fops
> (mostly reads/writes I guess), runs into less number of consistency issues.
> However, as you've mentioned we seem to disable perf xlator stack in our
> tests/use-cases till now. Note that perf xlator stack is one of worst
> offenders as far as the metadata consistency is concerned (relatively less
> scenarios of data inconsistency). So, I wonder,
> * what would be the scenario if we enable perf xlator stack for
> gluster-block?
>


tcmu-runner opens block devices with O_DIRECT. So enabling perf xlators for
gluster-block would not make a difference as translators like io-cache &
read-ahead do not enable caching for open() with O_DIRECT. In addition,
since bulk of the operations happen to be reads & writes on large files
with gluster-block, md-cache & quick-read are not appropriate for the stack
that tcmu-runner operates on.


* Is performance on gluster-block satisfactory so that we don't need these
> xlators?
>   - Or is it that these xlators are not useful for the workload usually
> run on gluster-block (For random read/write workload, read/write caching
> xlators offer less or no advantage)?
>   - Or theoretically the workload is ought to benefit from perf xlators,
> but we don't see them in our results (there are open bugs to this effect)?
>


Owing to the reasons mentioned above, most performance xlators do not seem
very useful for gluster-block workloads.


 Regards,
Vijay
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Glusterfs and Structured data

2018-02-07 Thread Raghavendra G
On Tue, Feb 6, 2018 at 8:15 PM, Vijay Bellur  wrote:

>
>
> On Sun, Feb 4, 2018 at 3:39 AM, Raghavendra Gowdappa 
> wrote:
>
>> All,
>>
>> One of our users pointed out to the documentation that glusterfs is not
>> good for storing "Structured data" [1], while discussing an issue [2].
>
>
>
> As far as I remember, the content around structured data in the Install
> Guide is from a FAQ that was being circulated in Gluster, Inc. indicating
> the startup's market positioning. Most of that was based on not wanting to
> get into performance based comparisons of storage systems that are
> frequently seen in the structured data space.
>
>
>> Does any of you have more context on the feasibility of storing
>> "structured data" on Glusterfs? Is one of the reasons for such a suggestion
>> "staleness of metadata" as encountered in bugs like [3]?
>>
>
>
> There are challenges that distributed storage systems face when exposed to
> applications that were written for a local filesystem interface. We have
> encountered problems with applications like tar [4] that are not in the
> realm of "Structured data". If we look at the common theme across all these
> problems, it is related to metadata & read after write consistency issues
> with the default translator stack that gets exposed on the client side.
> While the default stack is optimal for other scenarios, it does seem that a
> category of applications needing strict metadata consistency is not well
> served by that. We have observed that disabling a few performance
> translators and tuning cache timeouts for VFS/FUSE have helped to overcome
> some of them. The WIP effort on timestamp consistency across the translator
> stack, patches that have been merged as a result of the bugs that you
> mention & other fixes for outstanding issues should certainly help in
> catering to these workloads better with the file interface.
>
> There are deployments that I have come across where glusterfs is used for
> storing structured data. gluster-block  & qemu-libgfapi overcome the
> metadata consistency problem by exposing a file as a block device & by
> disabling most of the performance translators in the default stack.
> Workloads that have been deemed problematic with the file interface for the
> reasons alluded above, function well with the block interface.
>

I agree that gluster-block due to its usage of a subset of glusterfs fops
(mostly reads/writes I guess), runs into less number of consistency issues.
However, as you've mentioned we seem to disable perf xlator stack in our
tests/use-cases till now. Note that perf xlator stack is one of worst
offenders as far as the metadata consistency is concerned (relatively less
scenarios of data inconsistency). So, I wonder,
* what would be the scenario if we enable perf xlator stack for
gluster-block?
* Is performance on gluster-block satisfactory so that we don't need these
xlators?
  - Or is it that these xlators are not useful for the workload usually run
on gluster-block (For random read/write workload, read/write caching
xlators offer less or no advantage)?
  - Or theoretically the workload is ought to benefit from perf xlators,
but we don't see them in our results (there are open bugs to this effect)?

I am asking these questions to ascertain priority on fixing perf xlators
for (meta)data inconsistencies. If we offer a different solution for these
workloads, the need for fixing these issues will be less.

I feel that we have come a long way from the time the install guide was
> written and an update for removing the "staleness of content" might be in
> order there :-).
>
> Regards,
> Vijay
>
> [4] https://bugzilla.redhat.com/show_bug.cgi?id=1058526
>
>
>>
>> [1] http://docs.gluster.org/en/latest/Install-Guide/Overview/
>> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1512691
>> [3] https://bugzilla.redhat.com/show_bug.cgi?id=1390050
>>
>> regards,
>> Raghavendra
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Raghavendra G
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Glusterfs and Structured data

2018-02-06 Thread Vijay Bellur
On Sun, Feb 4, 2018 at 3:39 AM, Raghavendra Gowdappa 
wrote:

> All,
>
> One of our users pointed out to the documentation that glusterfs is not
> good for storing "Structured data" [1], while discussing an issue [2].



As far as I remember, the content around structured data in the Install
Guide is from a FAQ that was being circulated in Gluster, Inc. indicating
the startup's market positioning. Most of that was based on not wanting to
get into performance based comparisons of storage systems that are
frequently seen in the structured data space.


> Does any of you have more context on the feasibility of storing
> "structured data" on Glusterfs? Is one of the reasons for such a suggestion
> "staleness of metadata" as encountered in bugs like [3]?
>


There are challenges that distributed storage systems face when exposed to
applications that were written for a local filesystem interface. We have
encountered problems with applications like tar [4] that are not in the
realm of "Structured data". If we look at the common theme across all these
problems, it is related to metadata & read after write consistency issues
with the default translator stack that gets exposed on the client side.
While the default stack is optimal for other scenarios, it does seem that a
category of applications needing strict metadata consistency is not well
served by that. We have observed that disabling a few performance
translators and tuning cache timeouts for VFS/FUSE have helped to overcome
some of them. The WIP effort on timestamp consistency across the translator
stack, patches that have been merged as a result of the bugs that you
mention & other fixes for outstanding issues should certainly help in
catering to these workloads better with the file interface.

There are deployments that I have come across where glusterfs is used for
storing structured data. gluster-block  & qemu-libgfapi overcome the
metadata consistency problem by exposing a file as a block device & by
disabling most of the performance translators in the default stack.
Workloads that have been deemed problematic with the file interface for the
reasons alluded above, function well with the block interface. I feel that
we have come a long way from the time the install guide was written and an
update for removing the "staleness of content" might be in order there :-).

Regards,
Vijay

[4] https://bugzilla.redhat.com/show_bug.cgi?id=1058526


>
> [1] http://docs.gluster.org/en/latest/Install-Guide/Overview/
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1512691
> [3] https://bugzilla.redhat.com/show_bug.cgi?id=1390050
>
> regards,
> Raghavendra
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Glusterfs and Structured data

2018-02-06 Thread Amar Tumballi
On Tue, Feb 6, 2018 at 3:15 PM, Pranith Kumar Karampuri  wrote:

>
>
> On Sun, Feb 4, 2018 at 5:09 PM, Raghavendra Gowdappa 
> wrote:
>
>> All,
>>
>> One of our users pointed out to the documentation that glusterfs is not
>> good for storing "Structured data" [1], while discussing an issue [2]. Does
>> any of you have more context on the feasibility of storing "structured
>> data" on Glusterfs? Is one of the reasons for such a suggestion "staleness
>> of metadata" as encountered in bugs like [3]?\
>>
>
> I think the default configuration of glusterfs leads to unwanted behaviour
> with structured data workloads. Based on my experience with customers I
> handled, structured data usecase needs stronger read after write
> consistency guarantees from multiple clients which I don't think we got a
> chance to qualify. People who get it working generally disable perf
> xlators. Then there are some issues that happen because of distributed
> nature of glusterfs (For example: ctime).  So technically we can get
> glusterfs to work for structured data workloads. We will need to find and
> fix issues by trying out that workload. Performance qualification on that
> will also be useful to analyse.
>
>
A note on this for people who are not aware about 'gluster-block' project:

https://github.com/gluster/gluster-block could be used to handle some such
workloads. If anyone gets into issues on top of Gluster Filesystem mount,
then feel free to try gluster-block out, and give us feedback to help
improving it.

-Amar


>> [1] http://docs.gluster.org/en/latest/Install-Guide/Overview/
>> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1512691
>> [3] https://bugzilla.redhat.com/show_bug.cgi?id=1390050
>>
>> regards,
>> Raghavendra
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
>
> --
> Pranith
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Amar Tumballi (amarts)
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Glusterfs and Structured data

2018-02-06 Thread Pranith Kumar Karampuri
On Sun, Feb 4, 2018 at 5:09 PM, Raghavendra Gowdappa 
wrote:

> All,
>
> One of our users pointed out to the documentation that glusterfs is not
> good for storing "Structured data" [1], while discussing an issue [2]. Does
> any of you have more context on the feasibility of storing "structured
> data" on Glusterfs? Is one of the reasons for such a suggestion "staleness
> of metadata" as encountered in bugs like [3]?\
>

I think the default configuration of glusterfs leads to unwanted behaviour
with structured data workloads. Based on my experience with customers I
handled, structured data usecase needs stronger read after write
consistency guarantees from multiple clients which I don't think we got a
chance to qualify. People who get it working generally disable perf
xlators. Then there are some issues that happen because of distributed
nature of glusterfs (For example: ctime).  So technically we can get
glusterfs to work for structured data workloads. We will need to find and
fix issues by trying out that workload. Performance qualification on that
will also be useful to analyse.


> [1] http://docs.gluster.org/en/latest/Install-Guide/Overview/
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1512691
> [3] https://bugzilla.redhat.com/show_bug.cgi?id=1390050
>
> regards,
> Raghavendra
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Glusterfs and Structured data

2018-02-04 Thread Raghavendra Gowdappa
All,

One of our users pointed out to the documentation that glusterfs is not good 
for storing "Structured data" [1], while discussing an issue [2]. Does any of 
you have more context on the feasibility of storing "structured data" on 
Glusterfs? Is one of the reasons for such a suggestion "staleness of metadata" 
as encountered in bugs like [3]?

[1] http://docs.gluster.org/en/latest/Install-Guide/Overview/
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1512691
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1390050

regards,
Raghavendra
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel