Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?

2015-10-02 Thread Joshua Harlow
A library that seems to have a pretty nice abstraction for this kind of
thing/pattern, could be an idea to use it, or have something similar
like it...

https://boltons.readthedocs.org/en/latest/fileutils.html#atomic-file-saving

-Josh

On Sat, 26 Sep 2015 10:48:39 +0200
Julien Danjou  wrote:

> On Tue, Sep 22 2015, Chris Friesen wrote:
> 
> > On 09/22/2015 05:48 PM, Joshua Harlow wrote:
> >> A present:
> >>
> >>  >>> import contextlib
> >>  >>> import os
> >>  >>>
> >>  >>> @contextlib.contextmanager
> >> ... def synced_file(path, mode='wb'):
> >> ...   with open(path, mode) as fh:
> >> ...  yield fh
> >> ...  os.fdatasync(fh.fileno())
> >> ...
> >>  >>> with synced_file("/tmp/b.txt") as fh:
> >> ...fh.write("b")
> >
> > Isn't that missing an "fh.flush()" somewhere before the fdatasync()?
> 
> Unless proven otherwise, close() does a flush().
> 


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?

2015-09-27 Thread Chris Friesen

On 09/26/2015 02:48 AM, Julien Danjou wrote:

On Tue, Sep 22 2015, Chris Friesen wrote:


On 09/22/2015 05:48 PM, Joshua Harlow wrote:

A present:

  >>> import contextlib
  >>> import os
  >>>
  >>> @contextlib.contextmanager
... def synced_file(path, mode='wb'):
...   with open(path, mode) as fh:
...  yield fh
...  os.fdatasync(fh.fileno())
...
  >>> with synced_file("/tmp/b.txt") as fh:
...fh.write("b")


Isn't that missing an "fh.flush()" somewhere before the fdatasync()?


Unless proven otherwise, close() does a flush().


There's no close() before the fdatasync() in the above code.  (And it wouldn't 
make sense anyway because you need the open fd to do the fdatasync().)


Chris


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?

2015-09-26 Thread Julien Danjou
On Tue, Sep 22 2015, Chris Friesen wrote:

> On 09/22/2015 05:48 PM, Joshua Harlow wrote:
>> A present:
>>
>>  >>> import contextlib
>>  >>> import os
>>  >>>
>>  >>> @contextlib.contextmanager
>> ... def synced_file(path, mode='wb'):
>> ...   with open(path, mode) as fh:
>> ...  yield fh
>> ...  os.fdatasync(fh.fileno())
>> ...
>>  >>> with synced_file("/tmp/b.txt") as fh:
>> ...fh.write("b")
>
> Isn't that missing an "fh.flush()" somewhere before the fdatasync()?

Unless proven otherwise, close() does a flush().

-- 
Julien Danjou
;; Free Software hacker
;; http://julien.danjou.info


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?

2015-09-25 Thread Mitsuhiro Tanino
> -Original Message-
> From: Chris Friesen [mailto:chris.frie...@windriver.com]
> Sent: Friday, September 25, 2015 3:04 PM
> To: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] [cinder] should we use fsync when writing iscsi
> config file?
> 
> On 09/25/2015 12:30 PM, Mitsuhiro Tanino wrote:
> > On 09/22/2015 06:43 PM, Robert Collins wrote:
> >> On 23 September 2015 at 09:52, Chris Friesen
> >>  wrote:
> >>> Hi,
> >>>
> >>> I recently had an issue with one file out of a dozen or so in
> >>> "/opt/cgcs/cinder/data/volumes/" being present but of size zero.
> >>> I'm running stable/kilo if it makes a difference.
> >>>
> >>> Looking at the code in
> >>> volume.targets.tgt.TgtAdm.create_iscsi_target(), I'm wondering if we
> >>> should do a fsync() before the close().  The way it stands now, it
> >>> seems like it might be possible to write the file, start making use
> >>> of it, and then take a power outage before it actually gets written
> >>> to persistent storage.  When we come back up we could have an
> >>> instance expecting to make use of it, but no target information in the on-
> disk copy of the file.
> >
> > I think even if there is no target information in configuration file
> > dir, c-vol started successfully and iSCSI targets were created automatically
> and volumes were exported, right?
> >
> > There is an problem in this case that the iSCSI target was created
> > without authentication because we can't get previous authentication from the
> configuration file.
> >
> > I'm curious what kind of problem did you met?
> 
> We had an issue in a private patch that was ported to Kilo without realizing
> that the data type of chap_auth had changed.

I understand. Thank you for your explanation.
 
> > In my understanding, the provider_auth in database has user name and 
> > password
> for iSCSI target.
> > Therefore if we get authentication from DB, I think we can self-heal
> > from this situation correctly after c-vol service is restarted.
> >
> > The lio target obtains authentication from provider_auth in database,
> > but tgtd, iet, cxt obtain authentication from file to recreate iSCSI target
> when c-vol is restarted.
> > If the file is missing, these volumes are exported without
> > authentication and configuration file is recreated as I mentioned above.
> >
> > tgtd: Get target chap auth from file
> > iet:  Get target chap auth from file
> > cxt:  Get target chap auth from file
> > lio:  Get target chap auth from Database(in provider_auth)
> > scst: Get target chap auth by using original command
> >
> > If we get authentication from DB for tgtd, iet and cxt same as lio, we
> > can recreate iSCSI target with proper authentication when c-vol is 
> > restarted.
> > I think this is a solution for this situation.
> 
> If we fixed the chap auth info then we could live with a zero-size file.
> However, with the current code if we take a kernel panic or power outage it's
> theoretically possible to end up with a corrupt file of nonzero size (due to
> metadata hitting the persistant storage before the data).  I'm not confident
> that the current code would deal properly with that.
> 
> That said, if we always regenerate every file from the DB on cinder-volume
> startup (regardless of whether or not it existed, and without reading in the
> existing file), then we'd be okay without the robustness improvements.

This file is referred when the SCSI target service is restarted.
Therefore, adding robustness for this file is also good approach. IMO.

Thanks,
Mitsuhiro Tanino

> Chris
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openstack.org_cgi-
> 2Dbin_mailman_listinfo_openstack-2Ddev&d=BQICAg&c=DZ-
> EF4pZfxGSU6MfABwx0g&r=klD1krzABGW034E9oBtY1xmIn3g7xZAIxV0XxaZpkJE&m=SZPjS9uXH42q
> hmzSRbSZ8x39C9xi3aBDw-SQ7xa8cTM&s=XWJ91NIJglFkBSr762rSq9TdWeiRSdS5Pl0LzS1_1Z8&e=

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?

2015-09-25 Thread Mitsuhiro Tanino
> -Original Message-
> From: Eric Harney [mailto:ehar...@redhat.com]
> Sent: Friday, September 25, 2015 2:56 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [cinder] should we use fsync when writing iscsi
> config file?
> 
> On 09/25/2015 02:30 PM, Mitsuhiro Tanino wrote:
> > On 09/22/2015 06:43 PM, Robert Collins wrote:
> >> On 23 September 2015 at 09:52, Chris Friesen
> >>  wrote:
> >>> Hi,
> >>>
> >>> I recently had an issue with one file out of a dozen or so in
> >>> "/opt/cgcs/cinder/data/volumes/" being present but of size zero.
> >>> I'm running stable/kilo if it makes a difference.
> >>>
> >>> Looking at the code in
> >>> volume.targets.tgt.TgtAdm.create_iscsi_target(), I'm wondering if we
> >>> should do a fsync() before the close().  The way it stands now, it
> >>> seems like it might be possible to write the file, start making use
> >>> of it, and then take a power outage before it actually gets written
> >>> to persistent storage.  When we come back up we could have an
> >>> instance expecting to make use of it, but no target information in the on-
> disk copy of the file.
> >
> > I think even if there is no target information in configuration file
> > dir, c-vol started successfully and iSCSI targets were created automatically
> and volumes were exported, right?
> >
> > There is an problem in this case that the iSCSI target was created
> > without authentication because we can't get previous authentication from the
> configuration file.
> >
> > I'm curious what kind of problem did you met?
> >
> >> If its being kept in sync with DB records, and won't self-heal from
> >> this situation, then yes. e.g. if the overall workflow is something
> >> like
> >
> > In my understanding, the provider_auth in database has user name and 
> > password
> for iSCSI target.
> > Therefore if we get authentication from DB, I think we can self-heal
> > from this situation correctly after c-vol service is restarted.
> >
> 
> Is this not already done as-needed by ensure_export()?

Yes. This logic is in the ensure_export but only lio target uses DB and
other targets use file.
 
> > The lio target obtains authentication from provider_auth in database,
> > but tgtd, iet, cxt obtain authentication from file to recreate iSCSI target
> when c-vol is restarted.
> > If the file is missing, these volumes are exported without
> > authentication and configuration file is recreated as I mentioned above.
> >
> > tgtd: Get target chap auth from file
> > iet:  Get target chap auth from file
> > cxt:  Get target chap auth from file
> > lio:  Get target chap auth from Database(in provider_auth)
> > scst: Get target chap auth by using original command
> >
> > If we get authentication from DB for tgtd, iet and cxt same as lio, we
> > can recreate iSCSI target with proper authentication when c-vol is 
> > restarted.
> > I think this is a solution for this situation.
> >
> 
> This may be possible, but fixing the target config file to be written more
> safely to work as currently intended is still a win.

I think it is better to fix both of them,
(1) Add a logic to write configuration file using fsync
(2) Read authentication from database during ensure_export() same as lio target.

Thanks,
Mitsuhiro Tanino

> > Any thought?
> >
> > Thanks,
> > Mitsuhiro Tanino
> >
> >> -Original Message-
> >> From: Chris Friesen [mailto:chris.frie...@windriver.com]
> >> Sent: Friday, September 25, 2015 12:48 PM
> >> To: openstack-dev@lists.openstack.org
> >> Subject: Re: [openstack-dev] [cinder] should we use fsync when
> >> writing iscsi config file?
> >>
> >> On 09/24/2015 04:21 PM, Chris Friesen wrote:
> >>> On 09/24/2015 12:18 PM, Chris Friesen wrote:
> >>>
> >>>>
> >>>> I think what happened is that we took the SIGTERM after the open()
> >>>> call in create_iscsi_target(), but before writing anything to the file.
> >>>>
> >>>>  f = open(volume_path, 'w+')
> >>>>  f.write(volume_conf)
> >>>>  f.close()
> >>>>
> >>>> The 'w+' causes the file to be immediately truncated on opening,
> >>>> leading to an empty file.
> >>>>
> >>>> To work around this, I think we need to

Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?

2015-09-25 Thread Chris Friesen

On 09/25/2015 12:30 PM, Mitsuhiro Tanino wrote:

On 09/22/2015 06:43 PM, Robert Collins wrote:

On 23 September 2015 at 09:52, Chris Friesen
 wrote:

Hi,

I recently had an issue with one file out of a dozen or so in
"/opt/cgcs/cinder/data/volumes/" being present but of size zero.  I'm
running stable/kilo if it makes a difference.

Looking at the code in
volume.targets.tgt.TgtAdm.create_iscsi_target(), I'm wondering if we
should do a fsync() before the close().  The way it stands now, it
seems like it might be possible to write the file, start making use
of it, and then take a power outage before it actually gets written
to persistent storage.  When we come back up we could have an
instance expecting to make use of it, but no target information in the on-disk 
copy of the file.


I think even if there is no target information in configuration file dir, c-vol 
started successfully
and iSCSI targets were created automatically and volumes were exported, right?

There is an problem in this case that the iSCSI target was created without 
authentication because
we can't get previous authentication from the configuration file.

I'm curious what kind of problem did you met?


We had an issue in a private patch that was ported to Kilo without realizing 
that the data type of chap_auth had changed.



In my understanding, the provider_auth in database has user name and password 
for iSCSI target.
Therefore if we get authentication from DB, I think we can self-heal from this 
situation
correctly after c-vol service is restarted.

The lio target obtains authentication from provider_auth in database, but tgtd, 
iet, cxt obtain
authentication from file to recreate iSCSI target when c-vol is restarted.
If the file is missing, these volumes are exported without authentication and 
configuration
file is recreated as I mentioned above.

tgtd: Get target chap auth from file
iet:  Get target chap auth from file
cxt:  Get target chap auth from file
lio:  Get target chap auth from Database(in provider_auth)
scst: Get target chap auth by using original command

If we get authentication from DB for tgtd, iet and cxt same as lio, we can 
recreate iSCSI target
with proper authentication when c-vol is restarted.
I think this is a solution for this situation.


If we fixed the chap auth info then we could live with a zero-size file. 
However, with the current code if we take a kernel panic or power outage it's 
theoretically possible to end up with a corrupt file of nonzero size (due to 
metadata hitting the persistant storage before the data).  I'm not confident 
that the current code would deal properly with that.


That said, if we always regenerate every file from the DB on cinder-volume 
startup (regardless of whether or not it existed, and without reading in the 
existing file), then we'd be okay without the robustness improvements.


Chris


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?

2015-09-25 Thread Eric Harney
On 09/25/2015 02:30 PM, Mitsuhiro Tanino wrote:
> On 09/22/2015 06:43 PM, Robert Collins wrote:
>> On 23 September 2015 at 09:52, Chris Friesen 
>>  wrote:
>>> Hi,
>>>
>>> I recently had an issue with one file out of a dozen or so in 
>>> "/opt/cgcs/cinder/data/volumes/" being present but of size zero.  I'm 
>>> running stable/kilo if it makes a difference.
>>>
>>> Looking at the code in 
>>> volume.targets.tgt.TgtAdm.create_iscsi_target(), I'm wondering if we 
>>> should do a fsync() before the close().  The way it stands now, it 
>>> seems like it might be possible to write the file, start making use 
>>> of it, and then take a power outage before it actually gets written 
>>> to persistent storage.  When we come back up we could have an 
>>> instance expecting to make use of it, but no target information in the 
>>> on-disk copy of the file.
> 
> I think even if there is no target information in configuration file dir, 
> c-vol started successfully
> and iSCSI targets were created automatically and volumes were exported, right?
> 
> There is an problem in this case that the iSCSI target was created without 
> authentication because
> we can't get previous authentication from the configuration file.
> 
> I'm curious what kind of problem did you met?
>   
>> If its being kept in sync with DB records, and won't self-heal from 
>> this situation, then yes. e.g. if the overall workflow is something 
>> like
> 
> In my understanding, the provider_auth in database has user name and password 
> for iSCSI target. 
> Therefore if we get authentication from DB, I think we can self-heal from 
> this situation
> correctly after c-vol service is restarted.
> 

Is this not already done as-needed by ensure_export()?

> The lio target obtains authentication from provider_auth in database, but 
> tgtd, iet, cxt obtain
> authentication from file to recreate iSCSI target when c-vol is restarted.
> If the file is missing, these volumes are exported without authentication and 
> configuration
> file is recreated as I mentioned above.
> 
> tgtd: Get target chap auth from file
> iet:  Get target chap auth from file
> cxt:  Get target chap auth from file
> lio:  Get target chap auth from Database(in provider_auth)
> scst: Get target chap auth by using original command
> 
> If we get authentication from DB for tgtd, iet and cxt same as lio, we can 
> recreate iSCSI target
> with proper authentication when c-vol is restarted.
> I think this is a solution for this situation.
> 

This may be possible, but fixing the target config file to be written
more safely to work as currently intended is still a win.

> Any thought?
> 
> Thanks,
> Mitsuhiro Tanino
> 
>> -Original Message-
>> From: Chris Friesen [mailto:chris.frie...@windriver.com]
>> Sent: Friday, September 25, 2015 12:48 PM
>> To: openstack-dev@lists.openstack.org
>> Subject: Re: [openstack-dev] [cinder] should we use fsync when writing iscsi
>> config file?
>>
>> On 09/24/2015 04:21 PM, Chris Friesen wrote:
>>> On 09/24/2015 12:18 PM, Chris Friesen wrote:
>>>
>>>>
>>>> I think what happened is that we took the SIGTERM after the open()
>>>> call in create_iscsi_target(), but before writing anything to the file.
>>>>
>>>>  f = open(volume_path, 'w+')
>>>>  f.write(volume_conf)
>>>>  f.close()
>>>>
>>>> The 'w+' causes the file to be immediately truncated on opening,
>>>> leading to an empty file.
>>>>
>>>> To work around this, I think we need to do the classic "write to a
>>>> temporary file and then rename it to the desired filename" trick.
>>>> The atomicity of the rename ensures that either the old contents or the new
>> contents are present.
>>>
>>> I'm pretty sure that upstream code is still susceptible to zeroing out
>>> the file in the above scenario.  However, it doesn't take an
>>> exception--that's due to a local change on our part that attempted to fix 
>>> the
>> below issue.
>>>
>>> The stable/kilo code *does* have a problem in that when it regenerates
>>> the file it's missing the CHAP authentication line (beginning with
>> "incominguser").
>>
>> I've proposed a change at https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__review.openstack.org_-23_c_227943_&d=BQICAg&c=DZ-
>> EF4pZfxGSU6MfABwx0g&r=klD1krzABGW034E9oBtY1xmIn3g7xZAIxV0XxaZpkJE&m=SVlOqKiqO04_
>> NttKUIoDiaOR7cePB0SOA1bpjakqAss&s=q2_8XBAVH9lQ2mdT72nW7dN2EafIqJEpHGLBuf4K970&e=
>>
>> If anyone has suggestions on how to do this more robustly or more cleanly,
>> please let me know.
>>
>> Chris
>>

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?

2015-09-25 Thread Mitsuhiro Tanino
On 09/22/2015 06:43 PM, Robert Collins wrote:
> On 23 September 2015 at 09:52, Chris Friesen 
>  wrote:
>> Hi,
>>
>> I recently had an issue with one file out of a dozen or so in 
>> "/opt/cgcs/cinder/data/volumes/" being present but of size zero.  I'm 
>> running stable/kilo if it makes a difference.
>>
>> Looking at the code in 
>> volume.targets.tgt.TgtAdm.create_iscsi_target(), I'm wondering if we 
>> should do a fsync() before the close().  The way it stands now, it 
>> seems like it might be possible to write the file, start making use 
>> of it, and then take a power outage before it actually gets written 
>> to persistent storage.  When we come back up we could have an 
>> instance expecting to make use of it, but no target information in the 
>> on-disk copy of the file.

I think even if there is no target information in configuration file dir, c-vol 
started successfully
and iSCSI targets were created automatically and volumes were exported, right?

There is an problem in this case that the iSCSI target was created without 
authentication because
we can't get previous authentication from the configuration file.

I'm curious what kind of problem did you met?
  
> If its being kept in sync with DB records, and won't self-heal from 
> this situation, then yes. e.g. if the overall workflow is something 
> like

In my understanding, the provider_auth in database has user name and password 
for iSCSI target. 
Therefore if we get authentication from DB, I think we can self-heal from this 
situation
correctly after c-vol service is restarted.

The lio target obtains authentication from provider_auth in database, but tgtd, 
iet, cxt obtain
authentication from file to recreate iSCSI target when c-vol is restarted.
If the file is missing, these volumes are exported without authentication and 
configuration
file is recreated as I mentioned above.

tgtd: Get target chap auth from file
iet:  Get target chap auth from file
cxt:  Get target chap auth from file
lio:  Get target chap auth from Database(in provider_auth)
scst: Get target chap auth by using original command

If we get authentication from DB for tgtd, iet and cxt same as lio, we can 
recreate iSCSI target
with proper authentication when c-vol is restarted.
I think this is a solution for this situation.

Any thought?

Thanks,
Mitsuhiro Tanino

> -Original Message-
> From: Chris Friesen [mailto:chris.frie...@windriver.com]
> Sent: Friday, September 25, 2015 12:48 PM
> To: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] [cinder] should we use fsync when writing iscsi
> config file?
> 
> On 09/24/2015 04:21 PM, Chris Friesen wrote:
> > On 09/24/2015 12:18 PM, Chris Friesen wrote:
> >
> >>
> >> I think what happened is that we took the SIGTERM after the open()
> >> call in create_iscsi_target(), but before writing anything to the file.
> >>
> >>  f = open(volume_path, 'w+')
> >>  f.write(volume_conf)
> >>  f.close()
> >>
> >> The 'w+' causes the file to be immediately truncated on opening,
> >> leading to an empty file.
> >>
> >> To work around this, I think we need to do the classic "write to a
> >> temporary file and then rename it to the desired filename" trick.
> >> The atomicity of the rename ensures that either the old contents or the new
> contents are present.
> >
> > I'm pretty sure that upstream code is still susceptible to zeroing out
> > the file in the above scenario.  However, it doesn't take an
> > exception--that's due to a local change on our part that attempted to fix 
> > the
> below issue.
> >
> > The stable/kilo code *does* have a problem in that when it regenerates
> > the file it's missing the CHAP authentication line (beginning with
> "incominguser").
> 
> I've proposed a change at https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__review.openstack.org_-23_c_227943_&d=BQICAg&c=DZ-
> EF4pZfxGSU6MfABwx0g&r=klD1krzABGW034E9oBtY1xmIn3g7xZAIxV0XxaZpkJE&m=SVlOqKiqO04_
> NttKUIoDiaOR7cePB0SOA1bpjakqAss&s=q2_8XBAVH9lQ2mdT72nW7dN2EafIqJEpHGLBuf4K970&e=
> 
> If anyone has suggestions on how to do this more robustly or more cleanly,
> please let me know.
> 
> Chris
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openstack.org_cgi-
> 2Dbin_mailman_listinfo_openstack-2Ddev&d=BQICAg&c=DZ-
> EF4pZfxGSU6MfABwx0g&r=klD1krzABGW034E9oBtY1xmIn3g7xZAIxV0XxaZpkJE&m=SVlOqKiqO04_
> NttKUIoDiaOR7cePB0SOA1bpjakqAss&s=0DBbmeXSIK2c5QlBnwURY1iwNN1AXuqOLaUYnxjBl0w&e=

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?

2015-09-25 Thread Chris Friesen

On 09/24/2015 04:21 PM, Chris Friesen wrote:

On 09/24/2015 12:18 PM, Chris Friesen wrote:



I think what happened is that we took the SIGTERM after the open() call in
create_iscsi_target(), but before writing anything to the file.

 f = open(volume_path, 'w+')
 f.write(volume_conf)
 f.close()

The 'w+' causes the file to be immediately truncated on opening, leading to an
empty file.

To work around this, I think we need to do the classic "write to a temporary
file and then rename it to the desired filename" trick.  The atomicity of the
rename ensures that either the old contents or the new contents are present.


I'm pretty sure that upstream code is still susceptible to zeroing out the file
in the above scenario.  However, it doesn't take an exception--that's due to a
local change on our part that attempted to fix the below issue.

The stable/kilo code *does* have a problem in that when it regenerates the file
it's missing the CHAP authentication line (beginning with "incominguser").


I've proposed a change at https://review.openstack.org/#/c/227943/

If anyone has suggestions on how to do this more robustly or more cleanly, 
please let me know.


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?

2015-09-24 Thread Chris Friesen

On 09/24/2015 12:18 PM, Chris Friesen wrote:



I think what happened is that we took the SIGTERM after the open() call in
create_iscsi_target(), but before writing anything to the file.

 f = open(volume_path, 'w+')
 f.write(volume_conf)
 f.close()

The 'w+' causes the file to be immediately truncated on opening, leading to an
empty file.

To work around this, I think we need to do the classic "write to a temporary
file and then rename it to the desired filename" trick.  The atomicity of the
rename ensures that either the old contents or the new contents are present.


I'm pretty sure that upstream code is still susceptible to zeroing out the file 
in the above scenario.  However, it doesn't take an exception--that's due to a 
local change on our part that attempted to fix the below issue.


The stable/kilo code *does* have a problem in that when it regenerates the file 
it's missing the CHAP authentication line (beginning with "incominguser").


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?

2015-09-24 Thread Chris Friesen

On 09/24/2015 10:54 AM, Chris Friesen wrote:


I took another look at the code and realized that the file *should* get rebuilt
on restart after a power outage--if the file already exists it will print a
warning message in the logs but it should still overwrite the contents of the
file with the desired contents.  However, that didn't happen in my case.

That made me confused about how I ever ended up with an empty persistence file.
  I went back to my logs and found this:

File "./usr/lib64/python2.7/site-packages/cinder/volume/manager.py", line 334,
in init_host
File "/usr/lib64/python2.7/site-packages/osprofiler/profiler.py", line 105, in
wrapper
File "./usr/lib64/python2.7/site-packages/cinder/volume/drivers/lvm.py", line
603, in ensure_export
File "./usr/lib64/python2.7/site-packages/cinder/volume/targets/iscsi.py", line
296, in ensure_export
File "./usr/lib64/python2.7/site-packages/cinder/volume/targets/tgt.py", line
185, in create_iscsi_target
TypeError: not enough arguments for format string


So it seems like we might have a bug in the handling of an empty file.


And I think I know how we got the empty file in the first place, and it wasn't 
the original file creation but rather the file re-creation.


I have logs from shortly before the above logs showing cinder-volume receiving a 
SIGTERM while it was processing the volume in question:



2015-09-21 19:23:59.123 12429 WARNING cinder.volume.targets.tgt 
[req-7d092503-198a-4f59-97e9-d4d520d38379 - - - - -] Persistence file already 
exists for volume, found file at: 
/opt/cgcs/cinder/data/volumes/volume-76c5f285-a15e-474e-b59e-fd609a624090
2015-09-21 19:24:01.252 12429 WARNING cinder.volume.targets.tgt 
[req-7d092503-198a-4f59-97e9-d4d520d38379 - - - - -] Persistence file already 
exists for volume, found file at: 
/opt/cgcs/cinder/data/volumes/volume-993c94b2-e256-4baf-ab55-805a8e28f547
2015-09-21 19:24:01.951 8201 INFO cinder.openstack.common.service 
[req-904f88a8-8e6f-425e-8df7-5cbb9baae0c5 - - - - -] Caught SIGTERM, stopping 
children



I think what happened is that we took the SIGTERM after the open() call in 
create_iscsi_target(), but before writing anything to the file.


f = open(volume_path, 'w+')
f.write(volume_conf)
f.close()

The 'w+' causes the file to be immediately truncated on opening, leading to an 
empty file.


To work around this, I think we need to do the classic "write to a temporary 
file and then rename it to the desired filename" trick.  The atomicity of the 
rename ensures that either the old contents or the new contents are present.


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?

2015-09-24 Thread Chris Friesen

On 09/22/2015 06:19 PM, John Griffith wrote:

On Tue, Sep 22, 2015 at 6:17 PM, John Griffith 


​That target file pretty much "is" the persistence record, the db entry is
the iqn and provider info only.  I think that adding the fdatasync isn't a
bad idea at all.  At the very least it doesn't hurt.  Power losses on attach
I would expect to be problematic regardless.

​Let me clarify the statement above, if you loose power to the node in the
middle of an attach process and the file wasn't written properly you're most
likely 'stuck' and will have to detach (which deletes the file) or it will be in
an error state and rebuild the file when you try the attach again anyway IIRC,
it's been a while since we've mucked with that code (thank goodness)!!


I took another look at the code and realized that the file *should* get rebuilt 
on restart after a power outage--if the file already exists it will print a 
warning message in the logs but it should still overwrite the contents of the 
file with the desired contents.  However, that didn't happen in my case.


That made me confused about how I ever ended up with an empty persistence file. 
 I went back to my logs and found this:


File "./usr/lib64/python2.7/site-packages/cinder/volume/manager.py", line 334, 
in init_host
File "/usr/lib64/python2.7/site-packages/osprofiler/profiler.py", line 105, in 
wrapper
File "./usr/lib64/python2.7/site-packages/cinder/volume/drivers/lvm.py", line 
603, in ensure_export
File "./usr/lib64/python2.7/site-packages/cinder/volume/targets/iscsi.py", line 
296, in ensure_export
File "./usr/lib64/python2.7/site-packages/cinder/volume/targets/tgt.py", line 
185, in create_iscsi_target

TypeError: not enough arguments for format string


So it seems like we might have a bug in the handling of an empty file.


In kilo/stable, line 185 code looks like this:

chap_str = 'incominguser %s %s' % chap_auth

This means that "chap_auth" must not be None coming into this function.  In 
volume.targets.iscsi.ISCSITarget.ensure_export() we call


chap_auth = self._get_target_chap_auth(context, iscsi_name)

which I think would return None given the empty file.

I tried to reproduce with devstack running stable/kilo by booting from volume 
and then removing the persistance file and restarting cinder-volume.  It 
recreated the persistance file, but the "incominguser" line was missing 
completely from the regenerated file.


I think I need to try to reproduce this in our load with some extra debugging, 
see if I can figure out exactly what's going on.



Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?

2015-09-22 Thread Joshua Harlow

Chris Friesen wrote:

On 09/22/2015 05:48 PM, Joshua Harlow wrote:

A present:

>>> import contextlib
>>> import os
>>>
>>> @contextlib.contextmanager
... def synced_file(path, mode='wb'):
... with open(path, mode) as fh:
... yield fh
... os.fdatasync(fh.fileno())
...
>>> with synced_file("/tmp/b.txt") as fh:
... fh.write("b")


Isn't that missing an "fh.flush()" somewhere before the fdatasync()?


I was testing you, obviously, lol, congrats you passed ;)



Chris




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?

2015-09-22 Thread John Griffith
On Tue, Sep 22, 2015 at 6:17 PM, John Griffith 
wrote:

>
>
> On Tue, Sep 22, 2015 at 5:48 PM, Joshua Harlow 
> wrote:
>
>> A present:
>>
>> >>> import contextlib
>> >>> import os
>> >>>
>> >>> @contextlib.contextmanager
>> ... def synced_file(path, mode='wb'):
>> ...   with open(path, mode) as fh:
>> ...  yield fh
>> ...  os.fdatasync(fh.fileno())
>> ...
>> >>> with synced_file("/tmp/b.txt") as fh:
>> ...fh.write("b")
>> ...
>>
>> Have fun :-P
>>
>> -Josh
>>
>>
>> Robert Collins wrote:
>>
>>> On 23 September 2015 at 09:52, Chris Friesen
>>>   wrote:
>>>
 Hi,

 I recently had an issue with one file out of a dozen or so in
 "/opt/cgcs/cinder/data/volumes/" being present but of size zero.  I'm
 running stable/kilo if it makes a difference.

 Looking at the code in volume.targets.tgt.TgtAdm.create_iscsi_target(),
 I'm
 wondering if we should do a fsync() before the close().  The way it
 stands
 now, it seems like it might be possible to write the file, start making
 use
 of it, and then take a power outage before it actually gets written to
 persistent storage.  When we come back up we could have an instance
 expecting to make use of it, but no target information in the on-disk
 copy
 of the file.

>>>
>>> If its being kept in sync with DB records, and won't self-heal from
>>> this situation, then yes. e.g. if the overall workflow is something
>>> like
>>>
>>> receive RPC request
>>> do some work, including writing the file
>>> reply to the RPC with 'ok'
>>> ->  gets written to the DB and the state recorded as ACTIVE or whatever..
>>>
>>> then yes, we need to behave as though its active even if the machine
>>> is power cycled robustly.
>>>
>>> Even then, fsync() explicitly says that it doesn't ensure that the
 directory
 changes have reached disk...for that another explicit fsync() on the
 file
 descriptor for the directory is needed.
 So I think for robustness we'd want to add

 f.flush()
 os.fsync(f.fileno())

>>>
>>> fdatasync would be better - we don't care about the metadata.
>>>
>>> between the existing f.write() and f.close(), and then add something
 like:

 f = open(volumes_dir, 'w+')
 os.fsync(f.fileno())
 f.close()

>>>
>>> Yup, but again - fdatasync here I believe.
>>>
>>> -Rob
>>>
>>>
>>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> ​That target file pretty much "is" the persistence record, the db entry is
> the iqn and provider info only.  I think that adding the fdatasync isn't a
> bad idea at all.  At the very least it doesn't hurt.  Power losses on
> attach I would expect to be problematic regardless.
>

​Let me clarify the statement above, if you loose power to the node in the
middle of an attach process and the file wasn't written properly you're
most likely 'stuck' and will have to detach (which deletes the file) or it
will be in an error state and rebuild the file when you try the attach
again anyway IIRC, it's been a while since we've mucked with that code
(thank goodness)!!
​


>
> Keep in mind that file is built as part of the attach process stemming
> from initialize-connection.
>
> John
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?

2015-09-22 Thread John Griffith
On Tue, Sep 22, 2015 at 5:48 PM, Joshua Harlow  wrote:

> A present:
>
> >>> import contextlib
> >>> import os
> >>>
> >>> @contextlib.contextmanager
> ... def synced_file(path, mode='wb'):
> ...   with open(path, mode) as fh:
> ...  yield fh
> ...  os.fdatasync(fh.fileno())
> ...
> >>> with synced_file("/tmp/b.txt") as fh:
> ...fh.write("b")
> ...
>
> Have fun :-P
>
> -Josh
>
>
> Robert Collins wrote:
>
>> On 23 September 2015 at 09:52, Chris Friesen
>>   wrote:
>>
>>> Hi,
>>>
>>> I recently had an issue with one file out of a dozen or so in
>>> "/opt/cgcs/cinder/data/volumes/" being present but of size zero.  I'm
>>> running stable/kilo if it makes a difference.
>>>
>>> Looking at the code in volume.targets.tgt.TgtAdm.create_iscsi_target(),
>>> I'm
>>> wondering if we should do a fsync() before the close().  The way it
>>> stands
>>> now, it seems like it might be possible to write the file, start making
>>> use
>>> of it, and then take a power outage before it actually gets written to
>>> persistent storage.  When we come back up we could have an instance
>>> expecting to make use of it, but no target information in the on-disk
>>> copy
>>> of the file.
>>>
>>
>> If its being kept in sync with DB records, and won't self-heal from
>> this situation, then yes. e.g. if the overall workflow is something
>> like
>>
>> receive RPC request
>> do some work, including writing the file
>> reply to the RPC with 'ok'
>> ->  gets written to the DB and the state recorded as ACTIVE or whatever..
>>
>> then yes, we need to behave as though its active even if the machine
>> is power cycled robustly.
>>
>> Even then, fsync() explicitly says that it doesn't ensure that the
>>> directory
>>> changes have reached disk...for that another explicit fsync() on the file
>>> descriptor for the directory is needed.
>>> So I think for robustness we'd want to add
>>>
>>> f.flush()
>>> os.fsync(f.fileno())
>>>
>>
>> fdatasync would be better - we don't care about the metadata.
>>
>> between the existing f.write() and f.close(), and then add something like:
>>>
>>> f = open(volumes_dir, 'w+')
>>> os.fsync(f.fileno())
>>> f.close()
>>>
>>
>> Yup, but again - fdatasync here I believe.
>>
>> -Rob
>>
>>
>>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
​That target file pretty much "is" the persistence record, the db entry is
the iqn and provider info only.  I think that adding the fdatasync isn't a
bad idea at all.  At the very least it doesn't hurt.  Power losses on
attach I would expect to be problematic regardless.

Keep in mind that file is built as part of the attach process stemming from
initialize-connection.

John
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?

2015-09-22 Thread Chris Friesen

On 09/22/2015 05:48 PM, Joshua Harlow wrote:

A present:

 >>> import contextlib
 >>> import os
 >>>
 >>> @contextlib.contextmanager
... def synced_file(path, mode='wb'):
...   with open(path, mode) as fh:
...  yield fh
...  os.fdatasync(fh.fileno())
...
 >>> with synced_file("/tmp/b.txt") as fh:
...fh.write("b")


Isn't that missing an "fh.flush()" somewhere before the fdatasync()?

Chris




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?

2015-09-22 Thread Joshua Harlow

A present:

>>> import contextlib
>>> import os
>>>
>>> @contextlib.contextmanager
... def synced_file(path, mode='wb'):
...   with open(path, mode) as fh:
...  yield fh
...  os.fdatasync(fh.fileno())
...
>>> with synced_file("/tmp/b.txt") as fh:
...fh.write("b")
...

Have fun :-P

-Josh

Robert Collins wrote:

On 23 September 2015 at 09:52, Chris Friesen
  wrote:

Hi,

I recently had an issue with one file out of a dozen or so in
"/opt/cgcs/cinder/data/volumes/" being present but of size zero.  I'm
running stable/kilo if it makes a difference.

Looking at the code in volume.targets.tgt.TgtAdm.create_iscsi_target(), I'm
wondering if we should do a fsync() before the close().  The way it stands
now, it seems like it might be possible to write the file, start making use
of it, and then take a power outage before it actually gets written to
persistent storage.  When we come back up we could have an instance
expecting to make use of it, but no target information in the on-disk copy
of the file.


If its being kept in sync with DB records, and won't self-heal from
this situation, then yes. e.g. if the overall workflow is something
like

receive RPC request
do some work, including writing the file
reply to the RPC with 'ok'
->  gets written to the DB and the state recorded as ACTIVE or whatever..

then yes, we need to behave as though its active even if the machine
is power cycled robustly.


Even then, fsync() explicitly says that it doesn't ensure that the directory
changes have reached disk...for that another explicit fsync() on the file
descriptor for the directory is needed.
So I think for robustness we'd want to add

f.flush()
os.fsync(f.fileno())


fdatasync would be better - we don't care about the metadata.


between the existing f.write() and f.close(), and then add something like:

f = open(volumes_dir, 'w+')
os.fsync(f.fileno())
f.close()


Yup, but again - fdatasync here I believe.

-Rob




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?

2015-09-22 Thread Robert Collins
On 23 September 2015 at 09:52, Chris Friesen
 wrote:
> Hi,
>
> I recently had an issue with one file out of a dozen or so in
> "/opt/cgcs/cinder/data/volumes/" being present but of size zero.  I'm
> running stable/kilo if it makes a difference.
>
> Looking at the code in volume.targets.tgt.TgtAdm.create_iscsi_target(), I'm
> wondering if we should do a fsync() before the close().  The way it stands
> now, it seems like it might be possible to write the file, start making use
> of it, and then take a power outage before it actually gets written to
> persistent storage.  When we come back up we could have an instance
> expecting to make use of it, but no target information in the on-disk copy
> of the file.

If its being kept in sync with DB records, and won't self-heal from
this situation, then yes. e.g. if the overall workflow is something
like

receive RPC request
do some work, including writing the file
reply to the RPC with 'ok'
-> gets written to the DB and the state recorded as ACTIVE or whatever..

then yes, we need to behave as though its active even if the machine
is power cycled robustly.

> Even then, fsync() explicitly says that it doesn't ensure that the directory
> changes have reached disk...for that another explicit fsync() on the file
> descriptor for the directory is needed.
> So I think for robustness we'd want to add
>
> f.flush()
> os.fsync(f.fileno())

fdatasync would be better - we don't care about the metadata.

> between the existing f.write() and f.close(), and then add something like:
>
> f = open(volumes_dir, 'w+')
> os.fsync(f.fileno())
> f.close()

Yup, but again - fdatasync here I believe.

-Rob


-- 
Robert Collins 
Distinguished Technologist
HP Converged Cloud

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [cinder] should we use fsync when writing iscsi config file?

2015-09-22 Thread Chris Friesen

Hi,

I recently had an issue with one file out of a dozen or so in 
"/opt/cgcs/cinder/data/volumes/" being present but of size zero.  I'm running 
stable/kilo if it makes a difference.


Looking at the code in volume.targets.tgt.TgtAdm.create_iscsi_target(), I'm 
wondering if we should do a fsync() before the close().  The way it stands now, 
it seems like it might be possible to write the file, start making use of it, 
and then take a power outage before it actually gets written to persistent 
storage.  When we come back up we could have an instance expecting to make use 
of it, but no target information in the on-disk copy of the file.


Even then, fsync() explicitly says that it doesn't ensure that the directory 
changes have reached disk...for that another explicit fsync() on the file 
descriptor for the directory is needed.


So I think for robustness we'd want to add

f.flush()
os.fsync(f.fileno())

between the existing f.write() and f.close(), and then add something like:

f = open(volumes_dir, 'w+')
os.fsync(f.fileno())
f.close()

Thoughts?

Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev