Re: [Gluster-devel] Rebalance data migration and corruption
> > > > hmm.. I would prefer an infinite timeout. The only scenario where brick > > process can forcefully flush leases would be connection lose with > > rebalance process. The more scenarios where brick can flush leases > > without knowledge of rebalance process, we open up more race-windows for > > this bug to occur. > > > > In fact at least in theory to be correct, rebalance process should > > replay all the transactions that happened during the lease which got > > flushed out by brick (after re-acquiring that lease). So, we would like > > to avoid any such scenarios. > > > > Btw, what is the necessity of timeouts? Is it an insurance against rogue > > clients who won't respond back to lease recalls? > yes. It is to protect from rogue clients and prevent starvation of other > clients. > > In the current design, every lease is associated with lease-id (like > lockowner in case of locks) and all the further fops (I/Os) have to be > done using this lease-id. So in case if any fop comes to brick process > with the lease-id of the lease which got flushed by the brick process, > we can send special error and rebalance process can then replay all > those fops. Will that be sufficient? How do I pass lease-id in a fop like readv? Should I pass it in xdata? This is sufficient for rebalance process. It can follow following algo: 1. Acquire a read-lease on the entire file on src. 2. Note the offset at which this transaction has started. Initially it'll be zero. But if leases were recalled, the offset will be the continuation from where last transaction left off. 3. Do multiple (read, src) and (write, dst). 4. If (read, src) returns an error (because of lease being flushed), Goto step 1 and start the transaction from offset remembered in step 2. Note that we don't update the offset here and we replay this failed transaction again. We update offset only on successful unlock. On receiving a lease-recall notification from brick, rebalance process does: 1. Note the offset till which it has successfully copied file from src to dst. 2. Make sure atleast one (read, src) and (write, dst) is done since we last acquired the lease (at least a best effort). This will ensure that rebalance process won't get stuck in an infinite loop. 3. Issue an unlock. If unlock is successful, next transaction will continue from offset noted in 1. Else, this transaction is considered a failure and rebalance process behaves exactly the same way as read failed above because of lease expiry. In this algo, to avoid rebalance process getting stuck in infinite loop, we should make sure unlocks are successful (to the extent they can be made successful). We can also add max-number of retries for transaction on same region of file and fail the migration once we exceed so many retries. > > CCin Poornima who has been implementing it. > > > Thanks, > Soumya > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Rebalance data migration and corruption
On 02/09/2016 12:30 PM, Raghavendra G wrote: Right. But if there are simultaneous access to the same file from any other client and rebalance process, delegations shall not be granted or revoked if granted even though they are operating at different offsets. So if you rely only on delegations, migration may not proceed if an application has held a lock or doing any I/Os. Does the brick process wait for the response of delegation holder (rebalance process here) before it wipes out the delegation/locks? If that's the case, rebalance process can complete one transaction of (read, src) and (write, dst) before responding to a delegation recall. That way there is no starvation for both applications and rebalance process (though this makes both of them slower, but that cannot helped I think). yes. Brick process should wait for certain period before revoking the delegations forcefully in case if it is not returned by the client. Also if required (like done by NFS servers) we can choose to increase this timeout value at run time if the client is diligently flushing the data. hmm.. I would prefer an infinite timeout. The only scenario where brick process can forcefully flush leases would be connection lose with rebalance process. The more scenarios where brick can flush leases without knowledge of rebalance process, we open up more race-windows for this bug to occur. In fact at least in theory to be correct, rebalance process should replay all the transactions that happened during the lease which got flushed out by brick (after re-acquiring that lease). So, we would like to avoid any such scenarios. Btw, what is the necessity of timeouts? Is it an insurance against rogue clients who won't respond back to lease recalls? yes. It is to protect from rogue clients and prevent starvation of other clients. In the current design, every lease is associated with lease-id (like lockowner in case of locks) and all the further fops (I/Os) have to be done using this lease-id. So in case if any fop comes to brick process with the lease-id of the lease which got flushed by the brick process, we can send special error and rebalance process can then replay all those fops. Will that be sufficient? CCin Poornima who has been implementing it. Thanks, Soumya ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Rebalance data migration and corruption
>>Right. But if there are simultaneous access to the same file from > any other client and rebalance process, delegations shall not be >> granted or revoked if granted even though they are operating at >> different offsets. So if you rely only on delegations, migration may >> not proceed if an application has held a lock or doing any I/Os. >> >> >> Does the brick process wait for the response of delegation holder >> (rebalance process here) before it wipes out the delegation/locks? If >> that's the case, rebalance process can complete one transaction of >> (read, src) and (write, dst) before responding to a delegation recall. >> That way there is no starvation for both applications and rebalance >> process (though this makes both of them slower, but that cannot helped I >> think). >> > > yes. Brick process should wait for certain period before revoking the > delegations forcefully in case if it is not returned by the client. Also if > required (like done by NFS servers) we can choose to increase this timeout > value at run time if the client is diligently flushing the data. hmm.. I would prefer an infinite timeout. The only scenario where brick process can forcefully flush leases would be connection lose with rebalance process. The more scenarios where brick can flush leases without knowledge of rebalance process, we open up more race-windows for this bug to occur. In fact at least in theory to be correct, rebalance process should replay all the transactions that happened during the lease which got flushed out by brick (after re-acquiring that lease). So, we would like to avoid any such scenarios. Btw, what is the necessity of timeouts? Is it an insurance against rogue clients who won't respond back to lease recalls? ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Rebalance data migration and corruption
On 02/09/2016 10:27 AM, Raghavendra G wrote: On Mon, Feb 8, 2016 at 4:31 PM, Soumya Koduri mailto:skod...@redhat.com>> wrote: On 02/08/2016 09:13 AM, Shyam wrote: On 02/06/2016 06:36 PM, Raghavendra Gowdappa wrote: - Original Message - From: "Raghavendra Gowdappa" mailto:rgowd...@redhat.com>> To: "Sakshi Bansal" mailto:saban...@redhat.com>>, "Susant Palai" mailto:spa...@redhat.com>> Cc: "Gluster Devel" mailto:gluster-devel@gluster.org>>, "Nithya Balachandran" mailto:nbala...@redhat.com>>, "Shyamsundar Ranganathan" mailto:srang...@redhat.com>> Sent: Friday, February 5, 2016 4:32:40 PM Subject: Re: Rebalance data migration and corruption +gluster-devel Hi Sakshi/Susant, - There is a data corruption issue in migration code. Rebalance process, 1. Reads data from src 2. Writes (say w1) it to dst However, 1 and 2 are not atomic, so another write (say w2) to same region can happen between 1. But these two writes can reach dst in the order (w2, w1) resulting in a subtle corruption. This issue is not fixed yet and can cause subtle data corruptions. The fix is simple and involves rebalance process acquiring a mandatory lock to make 1 and 2 atomic. We can make use of compound fop framework to make sure we don't suffer a significant performance hit. Following will be the sequence of operations done by rebalance process: 1. issues a compound (mandatory lock, read) operation on src. 2. writes this data to dst. 3. issues unlock of lock acquired in 1. Please co-ordinate with Anuradha for implementation of this compound fop. Following are the issues I see with this approach: 1. features/locks provides mandatory lock functionality only for posix-locks (flock and fcntl based locks). So, mandatory locks will be posix-locks which will conflict with locks held by application. So, if an application has held an fcntl/flock, migration cannot proceed. What if the file is opened with O_NONBLOCK? Cant rebalance process skip the file and continue in case if mandatory lock acquisition fails? Similar functionality can be achieved by acquiring non-blocking inodelk like SETLK (as opposed to SETLKW). However whether rebalance process should block or not depends on the use case. In Some use-cases (like remove-brick) rebalance process _has_ to migrate all the files. Even for other scenarios skipping too many files is not a good idea as it beats the purpose of running rebalance. So one of the design goals is to migrate as many files as possible without making design too complex. We can implement a "special" domain for mandatory internal locks. These locks will behave similar to posix mandatory locks in that conflicting fops (like write, read) are blocked/failed if they are done while a lock is held. So is the only difference between mandatory internal locks and posix mandatory locks is that internal locks shall not conflict with other application locks(advisory/mandatory)? Yes. Mandatory internal locks (aka Mandatory inodelk for this discussion) will conflict only in their domain. They also conflict with any fops that might change the file (primarily write here, but different fops can be added based on requirement). So in a fop like writev we need to check in two lists - external lock (posix lock) list _and_ mandatory inodelk list. The reason (if not clear) for using mandatory locks by rebalance process is that clients need not be bothered with acquiring a lock (which will unnecessarily degrade performance of I/O when there is no rebalance going on). Thanks to Raghavendra Talur for suggesting this idea (though in a different context of lock migration, but the use-cases are similar). 2. data migration will be less efficient because of an extra unlock (with compound lock + read) or extra lock and unlock (for non-compound fop based
Re: [Gluster-devel] Rebalance data migration and corruption
On Mon, Feb 8, 2016 at 4:31 PM, Soumya Koduri wrote: > > > On 02/08/2016 09:13 AM, Shyam wrote: > >> On 02/06/2016 06:36 PM, Raghavendra Gowdappa wrote: >> >>> >>> >>> - Original Message - >>> From: "Raghavendra Gowdappa" To: "Sakshi Bansal" , "Susant Palai" Cc: "Gluster Devel" , "Nithya Balachandran" , "Shyamsundar Ranganathan" Sent: Friday, February 5, 2016 4:32:40 PM Subject: Re: Rebalance data migration and corruption +gluster-devel > Hi Sakshi/Susant, > > - There is a data corruption issue in migration code. Rebalance > process, >1. Reads data from src >2. Writes (say w1) it to dst > >However, 1 and 2 are not atomic, so another write (say w2) to > same region >can happen between 1. But these two writes can reach dst in the > order >(w2, >w1) resulting in a subtle corruption. This issue is not fixed yet > and can >cause subtle data corruptions. The fix is simple and involves > rebalance >process acquiring a mandatory lock to make 1 and 2 atomic. > We can make use of compound fop framework to make sure we don't suffer a significant performance hit. Following will be the sequence of operations done by rebalance process: 1. issues a compound (mandatory lock, read) operation on src. 2. writes this data to dst. 3. issues unlock of lock acquired in 1. Please co-ordinate with Anuradha for implementation of this compound fop. Following are the issues I see with this approach: 1. features/locks provides mandatory lock functionality only for posix-locks (flock and fcntl based locks). So, mandatory locks will be posix-locks which will conflict with locks held by application. So, if an application has held an fcntl/flock, migration cannot proceed. >>> > What if the file is opened with O_NONBLOCK? Cant rebalance process skip > the file and continue in case if mandatory lock acquisition fails? Similar functionality can be achieved by acquiring non-blocking inodelk like SETLK (as opposed to SETLKW). However whether rebalance process should block or not depends on the use case. In Some use-cases (like remove-brick) rebalance process _has_ to migrate all the files. Even for other scenarios skipping too many files is not a good idea as it beats the purpose of running rebalance. So one of the design goals is to migrate as many files as possible without making design too complex. > > >>> We can implement a "special" domain for mandatory internal locks. >>> These locks will behave similar to posix mandatory locks in that >>> conflicting fops (like write, read) are blocked/failed if they are >>> done while a lock is held. >>> >> > So is the only difference between mandatory internal locks and posix > mandatory locks is that internal locks shall not conflict with other > application locks(advisory/mandatory)? Yes. Mandatory internal locks (aka Mandatory inodelk for this discussion) will conflict only in their domain. They also conflict with any fops that might change the file (primarily write here, but different fops can be added based on requirement). So in a fop like writev we need to check in two lists - external lock (posix lock) list _and_ mandatory inodelk list. The reason (if not clear) for using mandatory locks by rebalance process is that clients need not be bothered with acquiring a lock (which will unnecessarily degrade performance of I/O when there is no rebalance going on). Thanks to Raghavendra Talur for suggesting this idea (though in a different context of lock migration, but the use-cases are similar). > > >>> 2. data migration will be less efficient because of an extra unlock (with compound lock + read) or extra lock and unlock (for non-compound fop based implementation) for every read it does from src. >>> >>> Can we use delegations here? Rebalance process can acquire a >>> mandatory-write-delegation (an exclusive lock with a functionality >>> that delegation is recalled when a write operation happens). In that >>> case rebalance process, can do something like: >>> >>> 1. Acquire a read delegation for entire file. >>> 2. Migrate the entire file. >>> 3. Remove/unlock/give-back the delegation it has acquired. >>> >>> If a recall is issued from brick (when a write happens from mount), it >>> completes the current write to dst (or throws away the read from src) >>> to maintain atomicity. Before doing next set of (read, src) and >>> (write, dst) tries to reacquire lock. >>> >> >> With delegations this simplifies the normal path, when a file is >> exclusively handled by rebalance. It also improves the case where a >> client and rebalance are conflicting on a file, to degrade to mandatory >> locks by either parties. >> >> I would prefer we take the delegation route for such needs in the future. >>
Re: [Gluster-devel] Rebalance data migration and corruption
- Original Message - > From: "Joe Julian" > To: "Raghavendra Gowdappa" > Cc: gluster-devel@gluster.org > Sent: Monday, February 8, 2016 9:08:45 PM > Subject: Re: [Gluster-devel] Rebalance data migration and corruption > > > > On 02/08/2016 12:18 AM, Raghavendra Gowdappa wrote: > > > > - Original Message - > >> From: "Joe Julian" > >> To: gluster-devel@gluster.org > >> Sent: Monday, February 8, 2016 12:20:27 PM > >> Subject: Re: [Gluster-devel] Rebalance data migration and corruption > >> > >> Is this in current release versions? > > Yes. This bug is present in currently released versions. However, it can > > happen only if writes from application are happening to a file when it is > > being migrated. So, vaguely one can say probability is less. > > Probability is quite high when the volume is used for VM images, which > many are. The primary requirement for this corruption is that file should be under migration. Given that rebalance is done only during add/remove brick scenarios (or may be as a routine housekeeping to make lookups faster), I added that probability is lower. However, this will not be the case with tier where files can be under constant promotion/demotion because of access patterns. If there is a constant migration, dht too is susceptible to this bug with similar probability. > > > > >> On 02/07/2016 07:43 PM, Shyam wrote: > >>> On 02/06/2016 06:36 PM, Raghavendra Gowdappa wrote: > >>>> > >>>> - Original Message - > >>>>> From: "Raghavendra Gowdappa" > >>>>> To: "Sakshi Bansal" , "Susant Palai" > >>>>> > >>>>> Cc: "Gluster Devel" , "Nithya > >>>>> Balachandran" , "Shyamsundar > >>>>> Ranganathan" > >>>>> Sent: Friday, February 5, 2016 4:32:40 PM > >>>>> Subject: Re: Rebalance data migration and corruption > >>>>> > >>>>> +gluster-devel > >>>>> > >>>>>> Hi Sakshi/Susant, > >>>>>> > >>>>>> - There is a data corruption issue in migration code. Rebalance > >>>>>> process, > >>>>>> 1. Reads data from src > >>>>>> 2. Writes (say w1) it to dst > >>>>>> > >>>>>> However, 1 and 2 are not atomic, so another write (say w2) to > >>>>>> same region > >>>>>> can happen between 1. But these two writes can reach dst in the > >>>>>> order > >>>>>> (w2, > >>>>>> w1) resulting in a subtle corruption. This issue is not fixed > >>>>>> yet and can > >>>>>> cause subtle data corruptions. The fix is simple and involves > >>>>>> rebalance > >>>>>> process acquiring a mandatory lock to make 1 and 2 atomic. > >>>>> We can make use of compound fop framework to make sure we don't > >>>>> suffer a > >>>>> significant performance hit. Following will be the sequence of > >>>>> operations > >>>>> done by rebalance process: > >>>>> > >>>>> 1. issues a compound (mandatory lock, read) operation on src. > >>>>> 2. writes this data to dst. > >>>>> 3. issues unlock of lock acquired in 1. > >>>>> > >>>>> Please co-ordinate with Anuradha for implementation of this compound > >>>>> fop. > >>>>> > >>>>> Following are the issues I see with this approach: > >>>>> 1. features/locks provides mandatory lock functionality only for > >>>>> posix-locks > >>>>> (flock and fcntl based locks). So, mandatory locks will be > >>>>> posix-locks which > >>>>> will conflict with locks held by application. So, if an application > >>>>> has held > >>>>> an fcntl/flock, migration cannot proceed. > >>>> We can implement a "special" domain for mandatory internal locks. > >>>> These locks will behave similar to posix mandatory locks in that > >>>> conflicting fops (like write, read) are blocked/failed if they are > >>>> done while a lock is held. > >>>> > >>>>> 2. data migration will be les
Re: [Gluster-devel] Rebalance data migration and corruption
On 02/08/2016 12:18 AM, Raghavendra Gowdappa wrote: - Original Message - From: "Joe Julian" To: gluster-devel@gluster.org Sent: Monday, February 8, 2016 12:20:27 PM Subject: Re: [Gluster-devel] Rebalance data migration and corruption Is this in current release versions? Yes. This bug is present in currently released versions. However, it can happen only if writes from application are happening to a file when it is being migrated. So, vaguely one can say probability is less. Probability is quite high when the volume is used for VM images, which many are. On 02/07/2016 07:43 PM, Shyam wrote: On 02/06/2016 06:36 PM, Raghavendra Gowdappa wrote: - Original Message - From: "Raghavendra Gowdappa" To: "Sakshi Bansal" , "Susant Palai" Cc: "Gluster Devel" , "Nithya Balachandran" , "Shyamsundar Ranganathan" Sent: Friday, February 5, 2016 4:32:40 PM Subject: Re: Rebalance data migration and corruption +gluster-devel Hi Sakshi/Susant, - There is a data corruption issue in migration code. Rebalance process, 1. Reads data from src 2. Writes (say w1) it to dst However, 1 and 2 are not atomic, so another write (say w2) to same region can happen between 1. But these two writes can reach dst in the order (w2, w1) resulting in a subtle corruption. This issue is not fixed yet and can cause subtle data corruptions. The fix is simple and involves rebalance process acquiring a mandatory lock to make 1 and 2 atomic. We can make use of compound fop framework to make sure we don't suffer a significant performance hit. Following will be the sequence of operations done by rebalance process: 1. issues a compound (mandatory lock, read) operation on src. 2. writes this data to dst. 3. issues unlock of lock acquired in 1. Please co-ordinate with Anuradha for implementation of this compound fop. Following are the issues I see with this approach: 1. features/locks provides mandatory lock functionality only for posix-locks (flock and fcntl based locks). So, mandatory locks will be posix-locks which will conflict with locks held by application. So, if an application has held an fcntl/flock, migration cannot proceed. We can implement a "special" domain for mandatory internal locks. These locks will behave similar to posix mandatory locks in that conflicting fops (like write, read) are blocked/failed if they are done while a lock is held. 2. data migration will be less efficient because of an extra unlock (with compound lock + read) or extra lock and unlock (for non-compound fop based implementation) for every read it does from src. Can we use delegations here? Rebalance process can acquire a mandatory-write-delegation (an exclusive lock with a functionality that delegation is recalled when a write operation happens). In that case rebalance process, can do something like: 1. Acquire a read delegation for entire file. 2. Migrate the entire file. 3. Remove/unlock/give-back the delegation it has acquired. If a recall is issued from brick (when a write happens from mount), it completes the current write to dst (or throws away the read from src) to maintain atomicity. Before doing next set of (read, src) and (write, dst) tries to reacquire lock. With delegations this simplifies the normal path, when a file is exclusively handled by rebalance. It also improves the case where a client and rebalance are conflicting on a file, to degrade to mandatory locks by either parties. I would prefer we take the delegation route for such needs in the future. @Soumyak, can something like this be done with delegations? @Pranith, Afr does transactions for writing to its subvols. Can you suggest any optimizations here so that rebalance process can have a transaction for (read, src) and (write, dst) with minimal performance overhead? regards, Raghavendra. Comments? regards, Raghavendra. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Rebalance data migration and corruption
On 02/08/2016 09:13 AM, Shyam wrote: On 02/06/2016 06:36 PM, Raghavendra Gowdappa wrote: - Original Message - From: "Raghavendra Gowdappa" To: "Sakshi Bansal" , "Susant Palai" Cc: "Gluster Devel" , "Nithya Balachandran" , "Shyamsundar Ranganathan" Sent: Friday, February 5, 2016 4:32:40 PM Subject: Re: Rebalance data migration and corruption +gluster-devel Hi Sakshi/Susant, - There is a data corruption issue in migration code. Rebalance process, 1. Reads data from src 2. Writes (say w1) it to dst However, 1 and 2 are not atomic, so another write (say w2) to same region can happen between 1. But these two writes can reach dst in the order (w2, w1) resulting in a subtle corruption. This issue is not fixed yet and can cause subtle data corruptions. The fix is simple and involves rebalance process acquiring a mandatory lock to make 1 and 2 atomic. We can make use of compound fop framework to make sure we don't suffer a significant performance hit. Following will be the sequence of operations done by rebalance process: 1. issues a compound (mandatory lock, read) operation on src. 2. writes this data to dst. 3. issues unlock of lock acquired in 1. Please co-ordinate with Anuradha for implementation of this compound fop. Following are the issues I see with this approach: 1. features/locks provides mandatory lock functionality only for posix-locks (flock and fcntl based locks). So, mandatory locks will be posix-locks which will conflict with locks held by application. So, if an application has held an fcntl/flock, migration cannot proceed. What if the file is opened with O_NONBLOCK? Cant rebalance process skip the file and continue in case if mandatory lock acquisition fails? We can implement a "special" domain for mandatory internal locks. These locks will behave similar to posix mandatory locks in that conflicting fops (like write, read) are blocked/failed if they are done while a lock is held. So is the only difference between mandatory internal locks and posix mandatory locks is that internal locks shall not conflict with other application locks(advisory/mandatory)? 2. data migration will be less efficient because of an extra unlock (with compound lock + read) or extra lock and unlock (for non-compound fop based implementation) for every read it does from src. Can we use delegations here? Rebalance process can acquire a mandatory-write-delegation (an exclusive lock with a functionality that delegation is recalled when a write operation happens). In that case rebalance process, can do something like: 1. Acquire a read delegation for entire file. 2. Migrate the entire file. 3. Remove/unlock/give-back the delegation it has acquired. If a recall is issued from brick (when a write happens from mount), it completes the current write to dst (or throws away the read from src) to maintain atomicity. Before doing next set of (read, src) and (write, dst) tries to reacquire lock. With delegations this simplifies the normal path, when a file is exclusively handled by rebalance. It also improves the case where a client and rebalance are conflicting on a file, to degrade to mandatory locks by either parties. I would prefer we take the delegation route for such needs in the future. Right. But if there are simultaneous access to the same file from any other client and rebalance process, delegations shall not be granted or revoked if granted even though they are operating at different offsets. So if you rely only on delegations, migration may not proceed if an application has held a lock or doing any I/Os. Also ideally rebalance process has to take write delegation as it would end up writing the data on destination brick which shall affect READ I/Os, (though of course we can have special checks/hacks for internal generated fops). That said, having delegations shall definitely ensure correctness with respect to exclusive file access. Thanks, Soumya @Soumyak, can something like this be done with delegations? @Pranith, Afr does transactions for writing to its subvols. Can you suggest any optimizations here so that rebalance process can have a transaction for (read, src) and (write, dst) with minimal performance overhead? regards, Raghavendra. Comments? regards, Raghavendra. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Rebalance data migration and corruption
- Original Message - > From: "Joe Julian" > To: gluster-devel@gluster.org > Sent: Monday, February 8, 2016 12:20:27 PM > Subject: Re: [Gluster-devel] Rebalance data migration and corruption > > Is this in current release versions? Yes. This bug is present in currently released versions. However, it can happen only if writes from application are happening to a file when it is being migrated. So, vaguely one can say probability is less. > > On 02/07/2016 07:43 PM, Shyam wrote: > > On 02/06/2016 06:36 PM, Raghavendra Gowdappa wrote: > >> > >> > >> - Original Message - > >>> From: "Raghavendra Gowdappa" > >>> To: "Sakshi Bansal" , "Susant Palai" > >>> > >>> Cc: "Gluster Devel" , "Nithya > >>> Balachandran" , "Shyamsundar > >>> Ranganathan" > >>> Sent: Friday, February 5, 2016 4:32:40 PM > >>> Subject: Re: Rebalance data migration and corruption > >>> > >>> +gluster-devel > >>> > >>>> > >>>> Hi Sakshi/Susant, > >>>> > >>>> - There is a data corruption issue in migration code. Rebalance > >>>> process, > >>>>1. Reads data from src > >>>>2. Writes (say w1) it to dst > >>>> > >>>>However, 1 and 2 are not atomic, so another write (say w2) to > >>>> same region > >>>>can happen between 1. But these two writes can reach dst in the > >>>> order > >>>>(w2, > >>>>w1) resulting in a subtle corruption. This issue is not fixed > >>>> yet and can > >>>>cause subtle data corruptions. The fix is simple and involves > >>>> rebalance > >>>>process acquiring a mandatory lock to make 1 and 2 atomic. > >>> > >>> We can make use of compound fop framework to make sure we don't > >>> suffer a > >>> significant performance hit. Following will be the sequence of > >>> operations > >>> done by rebalance process: > >>> > >>> 1. issues a compound (mandatory lock, read) operation on src. > >>> 2. writes this data to dst. > >>> 3. issues unlock of lock acquired in 1. > >>> > >>> Please co-ordinate with Anuradha for implementation of this compound > >>> fop. > >>> > >>> Following are the issues I see with this approach: > >>> 1. features/locks provides mandatory lock functionality only for > >>> posix-locks > >>> (flock and fcntl based locks). So, mandatory locks will be > >>> posix-locks which > >>> will conflict with locks held by application. So, if an application > >>> has held > >>> an fcntl/flock, migration cannot proceed. > >> > >> We can implement a "special" domain for mandatory internal locks. > >> These locks will behave similar to posix mandatory locks in that > >> conflicting fops (like write, read) are blocked/failed if they are > >> done while a lock is held. > >> > >>> 2. data migration will be less efficient because of an extra unlock > >>> (with > >>> compound lock + read) or extra lock and unlock (for non-compound fop > >>> based > >>> implementation) for every read it does from src. > >> > >> Can we use delegations here? Rebalance process can acquire a > >> mandatory-write-delegation (an exclusive lock with a functionality > >> that delegation is recalled when a write operation happens). In that > >> case rebalance process, can do something like: > >> > >> 1. Acquire a read delegation for entire file. > >> 2. Migrate the entire file. > >> 3. Remove/unlock/give-back the delegation it has acquired. > >> > >> If a recall is issued from brick (when a write happens from mount), > >> it completes the current write to dst (or throws away the read from > >> src) to maintain atomicity. Before doing next set of (read, src) and > >> (write, dst) tries to reacquire lock. > > > > With delegations this simplifies the normal path, when a file is > > exclusively handled by rebalance. It also improves the case where a > > client and rebalance are conflicting on a file, to degrade to > > mandatory locks by either parties. > > > > I would prefer we take the delegation route for such needs in the future. > > > >> > >> @Soumyak, can something like this be done with delegations? > >> > >> @Pranith, > >> Afr does transactions for writing to its subvols. Can you suggest any > >> optimizations here so that rebalance process can have a transaction > >> for (read, src) and (write, dst) with minimal performance overhead? > >> > >> regards, > >> Raghavendra. > >> > >>> > >>> Comments? > >>> > >>>> > >>>> regards, > >>>> Raghavendra. > >>> > > ___ > > Gluster-devel mailing list > > Gluster-devel@gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-devel > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Rebalance data migration and corruption
Is this in current release versions? On 02/07/2016 07:43 PM, Shyam wrote: On 02/06/2016 06:36 PM, Raghavendra Gowdappa wrote: - Original Message - From: "Raghavendra Gowdappa" To: "Sakshi Bansal" , "Susant Palai" Cc: "Gluster Devel" , "Nithya Balachandran" , "Shyamsundar Ranganathan" Sent: Friday, February 5, 2016 4:32:40 PM Subject: Re: Rebalance data migration and corruption +gluster-devel Hi Sakshi/Susant, - There is a data corruption issue in migration code. Rebalance process, 1. Reads data from src 2. Writes (say w1) it to dst However, 1 and 2 are not atomic, so another write (say w2) to same region can happen between 1. But these two writes can reach dst in the order (w2, w1) resulting in a subtle corruption. This issue is not fixed yet and can cause subtle data corruptions. The fix is simple and involves rebalance process acquiring a mandatory lock to make 1 and 2 atomic. We can make use of compound fop framework to make sure we don't suffer a significant performance hit. Following will be the sequence of operations done by rebalance process: 1. issues a compound (mandatory lock, read) operation on src. 2. writes this data to dst. 3. issues unlock of lock acquired in 1. Please co-ordinate with Anuradha for implementation of this compound fop. Following are the issues I see with this approach: 1. features/locks provides mandatory lock functionality only for posix-locks (flock and fcntl based locks). So, mandatory locks will be posix-locks which will conflict with locks held by application. So, if an application has held an fcntl/flock, migration cannot proceed. We can implement a "special" domain for mandatory internal locks. These locks will behave similar to posix mandatory locks in that conflicting fops (like write, read) are blocked/failed if they are done while a lock is held. 2. data migration will be less efficient because of an extra unlock (with compound lock + read) or extra lock and unlock (for non-compound fop based implementation) for every read it does from src. Can we use delegations here? Rebalance process can acquire a mandatory-write-delegation (an exclusive lock with a functionality that delegation is recalled when a write operation happens). In that case rebalance process, can do something like: 1. Acquire a read delegation for entire file. 2. Migrate the entire file. 3. Remove/unlock/give-back the delegation it has acquired. If a recall is issued from brick (when a write happens from mount), it completes the current write to dst (or throws away the read from src) to maintain atomicity. Before doing next set of (read, src) and (write, dst) tries to reacquire lock. With delegations this simplifies the normal path, when a file is exclusively handled by rebalance. It also improves the case where a client and rebalance are conflicting on a file, to degrade to mandatory locks by either parties. I would prefer we take the delegation route for such needs in the future. @Soumyak, can something like this be done with delegations? @Pranith, Afr does transactions for writing to its subvols. Can you suggest any optimizations here so that rebalance process can have a transaction for (read, src) and (write, dst) with minimal performance overhead? regards, Raghavendra. Comments? regards, Raghavendra. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Rebalance data migration and corruption
On 02/06/2016 06:36 PM, Raghavendra Gowdappa wrote: - Original Message - From: "Raghavendra Gowdappa" To: "Sakshi Bansal" , "Susant Palai" Cc: "Gluster Devel" , "Nithya Balachandran" , "Shyamsundar Ranganathan" Sent: Friday, February 5, 2016 4:32:40 PM Subject: Re: Rebalance data migration and corruption +gluster-devel Hi Sakshi/Susant, - There is a data corruption issue in migration code. Rebalance process, 1. Reads data from src 2. Writes (say w1) it to dst However, 1 and 2 are not atomic, so another write (say w2) to same region can happen between 1. But these two writes can reach dst in the order (w2, w1) resulting in a subtle corruption. This issue is not fixed yet and can cause subtle data corruptions. The fix is simple and involves rebalance process acquiring a mandatory lock to make 1 and 2 atomic. We can make use of compound fop framework to make sure we don't suffer a significant performance hit. Following will be the sequence of operations done by rebalance process: 1. issues a compound (mandatory lock, read) operation on src. 2. writes this data to dst. 3. issues unlock of lock acquired in 1. Please co-ordinate with Anuradha for implementation of this compound fop. Following are the issues I see with this approach: 1. features/locks provides mandatory lock functionality only for posix-locks (flock and fcntl based locks). So, mandatory locks will be posix-locks which will conflict with locks held by application. So, if an application has held an fcntl/flock, migration cannot proceed. We can implement a "special" domain for mandatory internal locks. These locks will behave similar to posix mandatory locks in that conflicting fops (like write, read) are blocked/failed if they are done while a lock is held. 2. data migration will be less efficient because of an extra unlock (with compound lock + read) or extra lock and unlock (for non-compound fop based implementation) for every read it does from src. Can we use delegations here? Rebalance process can acquire a mandatory-write-delegation (an exclusive lock with a functionality that delegation is recalled when a write operation happens). In that case rebalance process, can do something like: 1. Acquire a read delegation for entire file. 2. Migrate the entire file. 3. Remove/unlock/give-back the delegation it has acquired. If a recall is issued from brick (when a write happens from mount), it completes the current write to dst (or throws away the read from src) to maintain atomicity. Before doing next set of (read, src) and (write, dst) tries to reacquire lock. With delegations this simplifies the normal path, when a file is exclusively handled by rebalance. It also improves the case where a client and rebalance are conflicting on a file, to degrade to mandatory locks by either parties. I would prefer we take the delegation route for such needs in the future. @Soumyak, can something like this be done with delegations? @Pranith, Afr does transactions for writing to its subvols. Can you suggest any optimizations here so that rebalance process can have a transaction for (read, src) and (write, dst) with minimal performance overhead? regards, Raghavendra. Comments? regards, Raghavendra. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Rebalance data migration and corruption
- Original Message - > From: "Raghavendra Gowdappa" > To: "Sakshi Bansal" , "Susant Palai" > Cc: "Gluster Devel" , "Nithya Balachandran" > , "Shyamsundar > Ranganathan" > Sent: Friday, February 5, 2016 4:32:40 PM > Subject: Re: Rebalance data migration and corruption > > +gluster-devel > > > > > Hi Sakshi/Susant, > > > > - There is a data corruption issue in migration code. Rebalance process, > > 1. Reads data from src > > 2. Writes (say w1) it to dst > > > > However, 1 and 2 are not atomic, so another write (say w2) to same region > > can happen between 1. But these two writes can reach dst in the order > > (w2, > > w1) resulting in a subtle corruption. This issue is not fixed yet and can > > cause subtle data corruptions. The fix is simple and involves rebalance > > process acquiring a mandatory lock to make 1 and 2 atomic. > > We can make use of compound fop framework to make sure we don't suffer a > significant performance hit. Following will be the sequence of operations > done by rebalance process: > > 1. issues a compound (mandatory lock, read) operation on src. > 2. writes this data to dst. > 3. issues unlock of lock acquired in 1. > > Please co-ordinate with Anuradha for implementation of this compound fop. > > Following are the issues I see with this approach: > 1. features/locks provides mandatory lock functionality only for posix-locks > (flock and fcntl based locks). So, mandatory locks will be posix-locks which > will conflict with locks held by application. So, if an application has held > an fcntl/flock, migration cannot proceed. We can implement a "special" domain for mandatory internal locks. These locks will behave similar to posix mandatory locks in that conflicting fops (like write, read) are blocked/failed if they are done while a lock is held. > 2. data migration will be less efficient because of an extra unlock (with > compound lock + read) or extra lock and unlock (for non-compound fop based > implementation) for every read it does from src. Can we use delegations here? Rebalance process can acquire a mandatory-write-delegation (an exclusive lock with a functionality that delegation is recalled when a write operation happens). In that case rebalance process, can do something like: 1. Acquire a read delegation for entire file. 2. Migrate the entire file. 3. Remove/unlock/give-back the delegation it has acquired. If a recall is issued from brick (when a write happens from mount), it completes the current write to dst (or throws away the read from src) to maintain atomicity. Before doing next set of (read, src) and (write, dst) tries to reacquire lock. @Soumyak, can something like this be done with delegations? @Pranith, Afr does transactions for writing to its subvols. Can you suggest any optimizations here so that rebalance process can have a transaction for (read, src) and (write, dst) with minimal performance overhead? regards, Raghavendra. > > Comments? > > > > > regards, > > Raghavendra. > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Rebalance data migration and corruption
+gluster-devel > > Hi Sakshi/Susant, > > - There is a data corruption issue in migration code. Rebalance process, > 1. Reads data from src > 2. Writes (say w1) it to dst > > However, 1 and 2 are not atomic, so another write (say w2) to same region > can happen between 1. But these two writes can reach dst in the order (w2, > w1) resulting in a subtle corruption. This issue is not fixed yet and can > cause subtle data corruptions. The fix is simple and involves rebalance > process acquiring a mandatory lock to make 1 and 2 atomic. We can make use of compound fop framework to make sure we don't suffer a significant performance hit. Following will be the sequence of operations done by rebalance process: 1. issues a compound (mandatory lock, read) operation on src. 2. writes this data to dst. 3. issues unlock of lock acquired in 1. Please co-ordinate with Anuradha for implementation of this compound fop. Following are the issues I see with this approach: 1. features/locks provides mandatory lock functionality only for posix-locks (flock and fcntl based locks). So, mandatory locks will be posix-locks which will conflict with locks held by application. So, if an application has held an fcntl/flock, migration cannot proceed. 2. data migration will be less efficient because of an extra unlock (with compound lock + read) or extra lock and unlock (for non-compound fop based implementation) for every read it does from src. Comments? > > regards, > Raghavendra. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel