Re: [Gluster-devel] fallocate behavior in glusterfs

2019-07-03 Thread Pranith Kumar Karampuri
On Wed, Jul 3, 2019 at 10:59 PM FNU Raghavendra Manjunath 
wrote:

>
>
> On Wed, Jul 3, 2019 at 3:28 AM Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>>
>>
>> On Wed, Jul 3, 2019 at 10:14 AM Ravishankar N 
>> wrote:
>>
>>>
>>> On 02/07/19 8:52 PM, FNU Raghavendra Manjunath wrote:
>>>
>>>
>>> Hi All,
>>>
>>> In glusterfs, there is an issue regarding the fallocate behavior. In
>>> short, if someone does fallocate from the mount point with some size that
>>> is greater than the available size in the backend filesystem where the file
>>> is present, then fallocate can fail with a subset of the required number of
>>> blocks allocated and then failing in the backend filesystem with ENOSPC
>>> error.
>>>
>>> The behavior of fallocate in itself is simlar to how it would have been
>>> on a disk filesystem (atleast xfs where it was checked). i.e. allocates
>>> subset of the required number of blocks and then fail with ENOSPC. And the
>>> file in itself would show the number of blocks in stat to be whatever was
>>> allocated as part of fallocate. Please refer [1] where the issue is
>>> explained.
>>>
>>> Now, there is one small difference between how the behavior is between
>>> glusterfs and xfs.
>>> In xfs after fallocate fails, doing 'stat' on the file shows the number
>>> of blocks that have been allocated. Whereas in glusterfs, the number of
>>> blocks is shown as zero which makes tools like "du" show zero consumption.
>>> This difference in behavior in glusterfs is because of libglusterfs on how
>>> it handles sparse files etc for calculating number of blocks (mentioned in
>>> [1])
>>>
>>> At this point I can think of 3 things on how to handle this.
>>>
>>> 1) Except for how many blocks are shown in the stat output for the file
>>> from the mount point (on which fallocate was done), the remaining behavior
>>> of attempting to allocate the requested size and failing when the
>>> filesystem becomes full is similar to that of XFS.
>>>
>>> Hence, what is required is to come up with a solution on how
>>> libglusterfs calculate blocks for sparse files etc (without breaking any of
>>> the existing components and features). This makes the behavior similar to
>>> that of backend filesystem. This might require its own time to fix
>>> libglusterfs logic without impacting anything else.
>>>
>>> I think we should just revert the commit
>>> b1a5fa55695f497952264e35a9c8eb2bbf1ec4c3 (BZ 817343) and see if it really
>>> breaks anything (or check whatever it breaks is something that we can live
>>> with). XFS speculative preallocation is not permanent and the extra space
>>> is freed up eventually. It can be sped up via procfs tunable:
>>> http://xfs.org/index.php/XFS_FAQ#Q:_How_can_I_speed_up_or_avoid_delayed_removal_of_speculative_preallocation.3F.
>>> We could also tune the allocsize option to a low value like 4k so that
>>> glusterfs quota is not affected.
>>>
>>> FWIW, ENOSPC is not the only fallocate problem in gluster because of
>>> 'iatt->ia_block' tweaking. It also breaks the --keep-size option (i.e. the
>>> FALLOC_FL_KEEP_SIZE flag in fallocate(2)) and reports incorrect du size.
>>>
>> Regards,
>>> Ravi
>>>
>>>
>>> OR
>>>
>>> 2) Once the fallocate fails in the backend filesystem, make posix xlator
>>> in the brick truncate the file to the previous size of the file before
>>> attempting fallocate. A patch [2] has been sent for this. But there is an
>>> issue with this when there are parallel writes and fallocate operations
>>> happening on the same file. It can lead to a data loss.
>>>
>>> a) statpre is obtained ===> before fallocate is attempted, get the stat
>>> hence the size of the file b) A parrallel Write fop on the same file that
>>> extends the file is successful c) Fallocate fails d) ftruncate truncates it
>>> to size given by statpre (i.e. the previous stat and the size obtained in
>>> step a)
>>>
>>> OR
>>>
>>> 3) Make posix check for available disk size before doing fallocate. i.e.
>>> in fallocate once posix gets the number of bytes to be allocated for the
>>> file from a particular offset, it checks whether so many bytes are
>>> available or not in the disk. If not, fail the fallocate fop with ENOSPC
>>> (without attempting it on the backend filesystem).
>>>
>>> There still is a probability of a parallel write happening while this
>>> fallocate is happening and by the time falllocate system call is attempted
>>> on the disk, the available space might have been less than what was
>>> calculated before fallocate.
>>> i.e. following things can happen
>>>
>>>  a) statfs ===> get the available space of the backend filesystem
>>>  b) a parallel write succeeds and extends the file
>>>  c) fallocate is attempted assuming there is sufficient space in the
>>> backend
>>>
>>> While the above situation can arise, I think we are still fine. Because
>>> fallocate is attempted from the offset received in the fop. So,
>>> irrespective of whether write extended the file or not, the fallocate
>>> itself will be a

Re: [Gluster-devel] fallocate behavior in glusterfs

2019-07-03 Thread FNU Raghavendra Manjunath
On Wed, Jul 3, 2019 at 3:28 AM Pranith Kumar Karampuri 
wrote:

>
>
> On Wed, Jul 3, 2019 at 10:14 AM Ravishankar N 
> wrote:
>
>>
>> On 02/07/19 8:52 PM, FNU Raghavendra Manjunath wrote:
>>
>>
>> Hi All,
>>
>> In glusterfs, there is an issue regarding the fallocate behavior. In
>> short, if someone does fallocate from the mount point with some size that
>> is greater than the available size in the backend filesystem where the file
>> is present, then fallocate can fail with a subset of the required number of
>> blocks allocated and then failing in the backend filesystem with ENOSPC
>> error.
>>
>> The behavior of fallocate in itself is simlar to how it would have been
>> on a disk filesystem (atleast xfs where it was checked). i.e. allocates
>> subset of the required number of blocks and then fail with ENOSPC. And the
>> file in itself would show the number of blocks in stat to be whatever was
>> allocated as part of fallocate. Please refer [1] where the issue is
>> explained.
>>
>> Now, there is one small difference between how the behavior is between
>> glusterfs and xfs.
>> In xfs after fallocate fails, doing 'stat' on the file shows the number
>> of blocks that have been allocated. Whereas in glusterfs, the number of
>> blocks is shown as zero which makes tools like "du" show zero consumption.
>> This difference in behavior in glusterfs is because of libglusterfs on how
>> it handles sparse files etc for calculating number of blocks (mentioned in
>> [1])
>>
>> At this point I can think of 3 things on how to handle this.
>>
>> 1) Except for how many blocks are shown in the stat output for the file
>> from the mount point (on which fallocate was done), the remaining behavior
>> of attempting to allocate the requested size and failing when the
>> filesystem becomes full is similar to that of XFS.
>>
>> Hence, what is required is to come up with a solution on how libglusterfs
>> calculate blocks for sparse files etc (without breaking any of the existing
>> components and features). This makes the behavior similar to that of
>> backend filesystem. This might require its own time to fix libglusterfs
>> logic without impacting anything else.
>>
>> I think we should just revert the commit
>> b1a5fa55695f497952264e35a9c8eb2bbf1ec4c3 (BZ 817343) and see if it really
>> breaks anything (or check whatever it breaks is something that we can live
>> with). XFS speculative preallocation is not permanent and the extra space
>> is freed up eventually. It can be sped up via procfs tunable:
>> http://xfs.org/index.php/XFS_FAQ#Q:_How_can_I_speed_up_or_avoid_delayed_removal_of_speculative_preallocation.3F.
>> We could also tune the allocsize option to a low value like 4k so that
>> glusterfs quota is not affected.
>>
>> FWIW, ENOSPC is not the only fallocate problem in gluster because of
>> 'iatt->ia_block' tweaking. It also breaks the --keep-size option (i.e. the
>> FALLOC_FL_KEEP_SIZE flag in fallocate(2)) and reports incorrect du size.
>>
> Regards,
>> Ravi
>>
>>
>> OR
>>
>> 2) Once the fallocate fails in the backend filesystem, make posix xlator
>> in the brick truncate the file to the previous size of the file before
>> attempting fallocate. A patch [2] has been sent for this. But there is an
>> issue with this when there are parallel writes and fallocate operations
>> happening on the same file. It can lead to a data loss.
>>
>> a) statpre is obtained ===> before fallocate is attempted, get the stat
>> hence the size of the file b) A parrallel Write fop on the same file that
>> extends the file is successful c) Fallocate fails d) ftruncate truncates it
>> to size given by statpre (i.e. the previous stat and the size obtained in
>> step a)
>>
>> OR
>>
>> 3) Make posix check for available disk size before doing fallocate. i.e.
>> in fallocate once posix gets the number of bytes to be allocated for the
>> file from a particular offset, it checks whether so many bytes are
>> available or not in the disk. If not, fail the fallocate fop with ENOSPC
>> (without attempting it on the backend filesystem).
>>
>> There still is a probability of a parallel write happening while this
>> fallocate is happening and by the time falllocate system call is attempted
>> on the disk, the available space might have been less than what was
>> calculated before fallocate.
>> i.e. following things can happen
>>
>>  a) statfs ===> get the available space of the backend filesystem
>>  b) a parallel write succeeds and extends the file
>>  c) fallocate is attempted assuming there is sufficient space in the
>> backend
>>
>> While the above situation can arise, I think we are still fine. Because
>> fallocate is attempted from the offset received in the fop. So,
>> irrespective of whether write extended the file or not, the fallocate
>> itself will be attempted for so many bytes from the offset which we found
>> to be available by getting statfs information.
>>
>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1724754#c3
>> [2] https://review.glus

Re: [Gluster-devel] Removing glupy from release 5.7

2019-07-03 Thread Michael Scherer
Le mercredi 03 juillet 2019 à 20:03 +0530, Deepshikha Khandelwal a
écrit :
> Misc, is EPEL got recently installed on the builders?

No, it has been there since september 2016. What got changed is that
python3 wasn't installed before.

> Can you please resolve the 'Why EPEL on builders?'. EPEL+python3 on
> builders seems not a good option to have.


Python 3 is pulled by 'mock', cf 
https://lists.gluster.org/pipermail/gluster-devel/2019-June/056347.html

So sure, I can remove EPEL, but then it will remove mock. Or I can
remove python3, and it will remove mock.

But again, the problem is not with the set of installed packages on the
builder, that's just showing there is a bug.

The configure script do pick the latest python version:
https://github.com/gluster/glusterfs/blob/master/configure.ac#L612

if there is a python3, it take that, if not, it fall back to python2. 

then, later:
https://github.com/gluster/glusterfs/blob/master/configure.ac#L639

it verify the presence of what is required to build.

So if there is a runtime version only of python3, it will detect
python3, but not build anything, because the -devel subpackage is not h
ere. 

There is 2 solutions:
- fix that piece of code, so it doesn't just test the presence of
python executable, but do that, and test the presence of headers before
deciding if we need to build or not glupy.

- use PYTHON env var to force python2, and document that it need to be
done.  


> On Thu, Jun 20, 2019 at 6:37 PM Michael Scherer 
> wrote:
> 
> > Le jeudi 20 juin 2019 à 08:38 -0400, Kaleb Keithley a écrit :
> > > On Thu, Jun 20, 2019 at 7:39 AM Michael Scherer <
> > > msche...@redhat.com>
> > > wrote:
> > > 
> > > > Le jeudi 20 juin 2019 à 06:57 -0400, Kaleb Keithley a écrit :
> > > > > AFAICT, working fine right up to when EPEL and python3 were
> > > > > installed
> > > > > on
> > > > > the centos builders.  If it was my decision, I'd undo that
> > > > > change.
> > > > 
> > > > The biggest problem is that mock do pull python3.
> > > > 
> > > > 
> > > 
> > > That's mock on Fedora — to run a build in a centos-i386 chroot.
> > > Fedora
> > > already has python3. I don't see how that can affect what's
> > > running
> > > in the
> > > mock chroot.
> > 
> > I am not sure we are talking about the same thing, but mock, the
> > rpm
> > package from EPEL 7, do pull python 3:
> > 
> > $ cat /etc/redhat-release;   rpm -q --requires mock |grep
> > 'python(abi'
> > Red Hat Enterprise Linux Server release 7.6 (Maipo)
> > python(abi) = 3.6
> > 
> > So we do have python3 installed on the Centos 7 builders (and was
> > after
> > a upgrade), and we are not going to remove it, because we use mock
> > for
> > a lot of stuff.
> > 
> > And again, if the configure script is detecting the wrong version
> > of
> > python, the fix is not to remove the version of python for the
> > builders, the fix is to detect the right version of python, or at
> > least, permit to people to bypass the detection.
> > 
> > > Is the build inside mock also installing EPEL and python3
> > > somehow?
> > > Now? If so, why?
> > 
> > No, I doubt but then, if we are using a chroot, the package
> > installed
> > on the builders shouldn't matter, since that's a chroot.
> > 
> > So I am kinda being lost.
> > 
> > > And maybe the solution for centos regressions is to run those in
> > > mock, with a centos-x86_64 chroot. Without EPEL or python3.
> > 
> > That would likely requires a big refactor of the setup, since we
> > have
> > to get the data out of specific place, etc. We would also need to
> > reinstall the builders to set partitions in a different way, with a
> > bigger / and/or give more space for /var/lib/mock.
> > 
> > I do not see that happening fast, and if my hypothesis of a issue
> > in
> > configure is right, then fixing seems the faster way to avoid the
> > issue.
> > --
> > Michael Scherer
> > Sysadmin, Community Infrastructure
> > 
> > 
> > 
> > ___
> > 
> > Community Meeting Calendar:
> > 
> > APAC Schedule -
> > Every 2nd and 4th Tuesday at 11:30 AM IST
> > Bridge: https://bluejeans.com/836554017
> > 
> > NA/EMEA Schedule -
> > Every 1st and 3rd Tuesday at 01:00 PM EDT
> > Bridge: https://bluejeans.com/486278655
> > 
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-devel
> > 
> > 
-- 
Michael Scherer
Sysadmin, Community Infrastructure





signature.asc
Description: This is a digitally signed message part
___

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



Re: [Gluster-devel] Removing glupy from release 5.7

2019-07-03 Thread Deepshikha Khandelwal
Misc, is EPEL got recently installed on the builders?

Can you please resolve the 'Why EPEL on builders?'. EPEL+python3 on
builders seems not a good option to have.


On Thu, Jun 20, 2019 at 6:37 PM Michael Scherer  wrote:

> Le jeudi 20 juin 2019 à 08:38 -0400, Kaleb Keithley a écrit :
> > On Thu, Jun 20, 2019 at 7:39 AM Michael Scherer 
> > wrote:
> >
> > > Le jeudi 20 juin 2019 à 06:57 -0400, Kaleb Keithley a écrit :
> > > > AFAICT, working fine right up to when EPEL and python3 were
> > > > installed
> > > > on
> > > > the centos builders.  If it was my decision, I'd undo that
> > > > change.
> > >
> > > The biggest problem is that mock do pull python3.
> > >
> > >
> >
> > That's mock on Fedora — to run a build in a centos-i386 chroot.
> > Fedora
> > already has python3. I don't see how that can affect what's running
> > in the
> > mock chroot.
>
> I am not sure we are talking about the same thing, but mock, the rpm
> package from EPEL 7, do pull python 3:
>
> $ cat /etc/redhat-release;   rpm -q --requires mock |grep 'python(abi'
> Red Hat Enterprise Linux Server release 7.6 (Maipo)
> python(abi) = 3.6
>
> So we do have python3 installed on the Centos 7 builders (and was after
> a upgrade), and we are not going to remove it, because we use mock for
> a lot of stuff.
>
> And again, if the configure script is detecting the wrong version of
> python, the fix is not to remove the version of python for the
> builders, the fix is to detect the right version of python, or at
> least, permit to people to bypass the detection.
>
> > Is the build inside mock also installing EPEL and python3 somehow?
> > Now? If so, why?
>
> No, I doubt but then, if we are using a chroot, the package installed
> on the builders shouldn't matter, since that's a chroot.
>
> So I am kinda being lost.
>
> > And maybe the solution for centos regressions is to run those in
> > mock, with a centos-x86_64 chroot. Without EPEL or python3.
>
> That would likely requires a big refactor of the setup, since we have
> to get the data out of specific place, etc. We would also need to
> reinstall the builders to set partitions in a different way, with a
> bigger / and/or give more space for /var/lib/mock.
>
> I do not see that happening fast, and if my hypothesis of a issue in
> configure is right, then fixing seems the faster way to avoid the
> issue.
> --
> Michael Scherer
> Sysadmin, Community Infrastructure
>
>
>
> ___
>
> Community Meeting Calendar:
>
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/836554017
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/486278655
>
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
___

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



Re: [Gluster-devel] fallocate behavior in glusterfs

2019-07-03 Thread Pranith Kumar Karampuri
On Wed, Jul 3, 2019 at 10:14 AM Ravishankar N 
wrote:

>
> On 02/07/19 8:52 PM, FNU Raghavendra Manjunath wrote:
>
>
> Hi All,
>
> In glusterfs, there is an issue regarding the fallocate behavior. In
> short, if someone does fallocate from the mount point with some size that
> is greater than the available size in the backend filesystem where the file
> is present, then fallocate can fail with a subset of the required number of
> blocks allocated and then failing in the backend filesystem with ENOSPC
> error.
>
> The behavior of fallocate in itself is simlar to how it would have been on
> a disk filesystem (atleast xfs where it was checked). i.e. allocates subset
> of the required number of blocks and then fail with ENOSPC. And the file in
> itself would show the number of blocks in stat to be whatever was allocated
> as part of fallocate. Please refer [1] where the issue is explained.
>
> Now, there is one small difference between how the behavior is between
> glusterfs and xfs.
> In xfs after fallocate fails, doing 'stat' on the file shows the number of
> blocks that have been allocated. Whereas in glusterfs, the number of blocks
> is shown as zero which makes tools like "du" show zero consumption. This
> difference in behavior in glusterfs is because of libglusterfs on how it
> handles sparse files etc for calculating number of blocks (mentioned in [1])
>
> At this point I can think of 3 things on how to handle this.
>
> 1) Except for how many blocks are shown in the stat output for the file
> from the mount point (on which fallocate was done), the remaining behavior
> of attempting to allocate the requested size and failing when the
> filesystem becomes full is similar to that of XFS.
>
> Hence, what is required is to come up with a solution on how libglusterfs
> calculate blocks for sparse files etc (without breaking any of the existing
> components and features). This makes the behavior similar to that of
> backend filesystem. This might require its own time to fix libglusterfs
> logic without impacting anything else.
>
> I think we should just revert the commit
> b1a5fa55695f497952264e35a9c8eb2bbf1ec4c3 (BZ 817343) and see if it really
> breaks anything (or check whatever it breaks is something that we can live
> with). XFS speculative preallocation is not permanent and the extra space
> is freed up eventually. It can be sped up via procfs tunable:
> http://xfs.org/index.php/XFS_FAQ#Q:_How_can_I_speed_up_or_avoid_delayed_removal_of_speculative_preallocation.3F.
> We could also tune the allocsize option to a low value like 4k so that
> glusterfs quota is not affected.
>
> FWIW, ENOSPC is not the only fallocate problem in gluster because of
> 'iatt->ia_block' tweaking. It also breaks the --keep-size option (i.e. the
> FALLOC_FL_KEEP_SIZE flag in fallocate(2)) and reports incorrect du size.
>
Regards,
> Ravi
>
>
> OR
>
> 2) Once the fallocate fails in the backend filesystem, make posix xlator
> in the brick truncate the file to the previous size of the file before
> attempting fallocate. A patch [2] has been sent for this. But there is an
> issue with this when there are parallel writes and fallocate operations
> happening on the same file. It can lead to a data loss.
>
> a) statpre is obtained ===> before fallocate is attempted, get the stat
> hence the size of the file b) A parrallel Write fop on the same file that
> extends the file is successful c) Fallocate fails d) ftruncate truncates it
> to size given by statpre (i.e. the previous stat and the size obtained in
> step a)
>
> OR
>
> 3) Make posix check for available disk size before doing fallocate. i.e.
> in fallocate once posix gets the number of bytes to be allocated for the
> file from a particular offset, it checks whether so many bytes are
> available or not in the disk. If not, fail the fallocate fop with ENOSPC
> (without attempting it on the backend filesystem).
>
> There still is a probability of a parallel write happening while this
> fallocate is happening and by the time falllocate system call is attempted
> on the disk, the available space might have been less than what was
> calculated before fallocate.
> i.e. following things can happen
>
>  a) statfs ===> get the available space of the backend filesystem
>  b) a parallel write succeeds and extends the file
>  c) fallocate is attempted assuming there is sufficient space in the
> backend
>
> While the above situation can arise, I think we are still fine. Because
> fallocate is attempted from the offset received in the fop. So,
> irrespective of whether write extended the file or not, the fallocate
> itself will be attempted for so many bytes from the offset which we found
> to be available by getting statfs information.
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1724754#c3
> [2] https://review.gluster.org/#/c/glusterfs/+/22969/
>
>
option 2) will affect performance if we have to serialize all the data
operations on the file.
option 3) can still lead to the same problem we